DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The amendment filed 04/12/2022 has been entered. The claim status is as follow: 
Claims 1-12, 15-23 are pending.

Response to Arguments
	Applicant’s arguments with respect to the objections to Claims 8, 10, and 12, see page 17-18 sec. Objections to the Claims, have been fully considered and are persuasive. The objections to claims 8, 10, and 12 have been withdrawn.
	Applicant’s arguments with respect to the objections to the specification, see page 18 sec. Objections to the Specification, have been fully considered and are persuasive. The objections to the specification have been withdrawn.
	Applicant’s arguments with respect to the 112 rejections for Claims 2-4, 6, and 17, see page 18 sec. Rejections under 35 U.S.C. §112, have been fully considered and are persuasive. The objection to Claims 2-4, 6, and 17 have been withdrawn. 
	Applicant’s arguments with respect to the 101 rejections for Claims 1-12 and 15-20, see page 18-19 sec. Rejections under 35 U.S.C. §101, have been fully considered and are fully persuasive. The 101 rejections for Claims 1-12 and 15-20 have been withdrawn. 
	Applicant’s arguments with respect to the 103 rejections for Claims 1-20, see page 19 sec. Rejections under 35 U.S.C. §103, have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 9, 11, 15-17, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalla et al. (US20150135012A1)  in view of Xu et al. (“Smart Real Estate Assessments Using Structured Deep Neural Networks”) (herein thereafter Xu). 

Regarding Claim 1:
Bhalla teaches a system for predicting network node failure in a communication network. Bhalla teaches:  
“A network management computer system comprising: a network metrics repository that stores performance metrics that are measured during operation of a communication network, the network metrics repository further storing fault values which indicate whether defined types of network operation faults have occurred, (Bhalla discloses a network node performance analytic record (NPAR) that stores performance metrics that were captured in real-time and fault values. [Bhalla ¶14: lines 15-20: “A network infrastructure may include a set of network nodes providing voice and data services over a network to customer premises. A service as used herein may include Supplying or providing information over a network, and is also referred to as a communications network service”, ¶21: lines 7-15: “The predictive model of the disclosed examples may process the vast amount of data that is captured from thousands of different sources in approximately real-time to provide the network provider with a prediction of a network node failure at least 48 hours in advance. For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real-time”, ¶41: lines 1-6: “In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.”]) 
wherein the communication network comprises a plurality of network nodes that receive and forward communication packets;”  (Bhalla teaches a communication network comprised of nodes that receives and forwards information which is what packets consist of. [Bhalla ¶14: 7-10: “For example, a network node may be a communication device that is capable of sending, receiving, or forwarding information over a communications channel in a network.”]).
“and at least one processor coupled to the network metrics repository and to the neural network circuit, the at least one processor configured to;” (Bhalla discloses a processor coupled to a data store and a forecasting engine that comprises of a neural network. [Bhalla Fig. 2A: 202, 204, 210, 216; ¶29: lines 1-4: “The processor 202, which may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), or the like, is to perform various processing functions in the predictive analytics server 110.”; ¶16: lines 2-6: “The plurality of models refers to any predictive model that may be used to forecast a probability of an outcome such as, for example, a decision tree model, a neural network model, a logistic regression model, etc.”]).
“and control operation of the communication network based on output of the output node of the neural network circuit, the output node providing the output responsive to processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network, wherein an operation to control operation of the communication network comprises at least one of shifting communication packet traffic away from one of the network nodes toward one or more other ones of the network nodes or communicating a command to one of the network nodes instructing the network node to reboot at least a portion of executable operation code of the network node responsive to the measured performance metrics characterizing operation of the one of the network nodes and the output of the output node of the neural network circuit indicating at least a threshold likelihood of a fault in operation of the one of the network nodes.” (Bhalla discloses a provisioning server that controls operation of the communication network by remotely shifting communication packet traffic depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”, ¶ 31: 12-22: “Furthermore, the system, for example, includes a network provisioning Subsystem to provision network nodes based on proactive failure prediction. Provisioning may include configuring network node parameters to provide needed bandwidth and accommodate quality of Service (QoS) requirements for the services provided by the network nodes.”]).
	However, Bhalla does not explicitly teach the specifics of the operation of the neural network. Bhalla does not teach rest of the claimed limitations. 
	Xu teaches:
 “a neural network circuit having an input layer having input nodes, a sequence of hidden layers each having a plurality of combining nodes, and an output layer having an output node;” (Examiner notes that combining node is a node that combines multiple values into one value through any method (e.g. multiplication or addition). Examiner further notes that any nodes with weights are combining nodes as they combine the input to the node with the weight. Xu discloses an input layer, hidden layers, and output layers in sec. V. C para 1, “According to Algorithm 2 presented in Section IV.A, the structured DNN has four layers: an input layer, two hidden layers, and an output layer.” Xu discloses the input layer having input nodes (equivalent to neurons) in sec. V. C para 1, “ Note that the first layer of the network contains 15 input neurons, which always produce outputs, as there are no biases (thresholds) connected to the input layer neurons.” Xu discloses that each layer has nodes in sec. 3 para 5, “Then we design a fully-connected DNN with k layers and nl neurons in each layer, where                                 
                                    1
                                    ≤
                                    l
                                     
                                    ≤
                                    k
                                
                            , and nl and nk are the number of predefined input features and the number of outputs, respectively.” Xu discloses combining nodes in sec. 3 para 6, “During each iteration, based on the range and distribution of the weights of the links in the DNN, we define a threshold for each hidden node (i.e., a node belonging to layer l, where 1 < l < k). […] Let s and d be the source node and destination node of a link ϛ, respectively, and w(ϛ) be the connection weight of the link.”)
“generate forecasted performance metrics based on extrapolating from measured performance metrics in the network metrics repository;” (Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale

“provide to the input nodes of the neural network circuit the forecasted performance metrics and the measured performance metrics;” (Xu discloses providing to the input of the neural network the forecasted metrics and measured metrics in Fig. 1, shown highlighted below:

    PNG
    media_image2.png
    452
    588
    media_image2.png
    Greyscale
 
“adapt weights and/or firing thresholds that are used by at least the input nodes of the neural network circuit responsive to real-time feedback of an output value of the output node of the neural network to reduce an error value based on comparison of the output value of the output node to at least one of the fault values from the network metrics repository;” (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.” Xu discloses that the neural network is responsive to real-time feedback in sec. IV. B para 2, “ Each time when new labeled data points are added into the data pool, the structured DNN is incrementally trained and learns from the new data points in real time.” Xu discloses increasing accuracy (which is equivalent to reducing error) based on a comparison between the output value of the output node and the values from data, shown highlighted below in Algorithm 1: 

    PNG
    media_image3.png
    404
    581
    media_image3.png
    Greyscale
). 
Bhalla, Xu, and the instant application are analogous art because they are directed to prediction systems using neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla to include the specifics of neural network operation, as taught by Xu. One would be motivated to do so to improve upon conventional fully-connected neural networks, as suggested by Xu (Xu Abstract: “The experimental results show that a structured DNN outperforms conventional multivariate linear regression models, fully-connected neural networks, and prediction methods used by the leading real estate companies.”). 

Regarding Claim 2:
	Bhalla in view of Xu teach the “The network management computer system of Claim 1” as seen above. 
Bhalla teaches:
“wherein: the network metrics repository stores the performance metrics that are measured during operation of the communication network and which are correlated to time sequence indicators for defined types of network operation performance characteristics,” (Bhalla discloses a network node performance analytic record (NPAR) that stores performance metrics that were captured in real-time and are associated by date which are time sequence indicators. [Bhalla ¶38: 1-11: “At block 410, the performance extractor 212 of the predictive analytics server 110 may aggregate performance metrics for network nodes in the network infrastructure to create a network node performance analytic record (NPAR).”, ¶21: 11-14: “For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real time.”, ¶38: 14-15: “The performance metrics in the NPAR may be divided by date (e.g., days)”]).
“the network metrics repository further stores fault values that are correlated to the time sequence indicators and which indicate whether defined types of network operation faults have occurred;” (The NPAR further stores fault values that are labelled with their corresponding date. [Bhalla ¶41: lines 1-6: “In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.” and Fig. 5]).   
“the at least one processor is further configured to: 19Attorney Docket 1100-180143/US20180143”  (Bhalla discloses a processor in Fig. 2B, shown highlighted below:
 
    PNG
    media_image4.png
    712
    630
    media_image4.png
    Greyscale
).
“and control operation of the communication network based on further output of the output node of the neural network circuit, the output node providing the further output responsive to processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network.” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
	Xu teaches:
“repeat operations for an ordered series of the time sequence indicators to:” (Xu discloses repeating training for new incoming data which is tied to time indicators in sec. IV. B para 2, “When new data points become available, they are added into the data pool once they are properly labeled; meanwhile, the old data points, which are now shifted out of the window, must be removed from the data pool. Each time when new labeled data points are added into the data pool, the structured DNN is incrementally trained and learns from the new data points in real time.  Since the structured DNN is always trained using the most recent data points within the window, it is able to follow new market trends, and would produce more accurate and reliable assessments for time-sensitive products, such as real properties.”)  
“for at least one of the defined types of [network operation performance] characteristics, generate a forecasted performance metric based on extrapolating from a sequence of the measured performance metrics in the [network] metrics repository that are for the type of [network operation performance] characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series;” (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data
from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of
houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
“provide to the input nodes of the neural network circuit the forecasted performance metrics and the measured performance metrics that are correlated to the time sequence indicator in the ordered series;” ((Xu discloses providing to the input of the neural network the forecasted metrics and measured metrics which are associated with time indicators in Fig. 1, shown highlighted below:

    PNG
    media_image2.png
    452
    588
    media_image2.png
    Greyscale
 
“determine the error value based on comparison of an output value of the output node of the neural network circuit to at least one of the fault values from the network metrics repository that is correlated to the time sequence indicator in the ordered series;” (Xu discloses increasing accuracy (which is equivalent to reducing error) based on a comparison between the output value of the output node and the values from data, shown highlighted below in Algorithm 1: 

    PNG
    media_image3.png
    404
    581
    media_image3.png
    Greyscale
 
“and adapt weights and/or firing thresholds, which are used by at least the input nodes of the neural network circuit to generate outputs to the combining nodes of a first one of the sequence of the hidden layers, to reduce the error value;” (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 5:
Bhalla in view of Xu teach “The network management computer system of Claim 2” as seen above. 
Xu further teaches:
“wherein the neural network circuit is configured to:  21Attorney Docket 1100-180143/US20180143 operate the input nodes of the input layer to each receive different ones of the forecasted performance metrics and the measured performance metrics that are correlated to the time sequence indicator in the ordered series, (Xu discloses input nodes receiving metrics in sec. V. B para 1, “ In this selected neural network, the input layer contains 15 neurons, representing the predefined input features for house value assessments.” Xu discloses the neural network receiving forecasted and measured metrics in Fig. 1, shown below:

    PNG
    media_image5.png
    452
    588
    media_image5.png
    Greyscale

each of the input nodes multiplying metric values that are inputted by a weight that is assigned to the input node to generate a weighted metric value, (Xu discloses that each value of a node is associated with a weight in sec. III para 5, “. Then we design a fully-connected DNN with k layers and nl neurons in each layer, where                                 
                                    1
                                    ≤
                                    l
                                     
                                    ≤
                                    k
                                     
                                
                            and nl and nk are the number of predefined input features and the number of outputs, respectively. After training the model, we identify the links with weak weights that are below a predefined threshold. We call such links weak links, which contribute less to the next layer than a normal link.” Xu discloses that the initial weights are assigned in sec. IV. B para 2, “Before starting the training process, the structured DNN is initialized with random weights.” ). 
and when the weighted metric value exceeds a firing threshold assigned to the input node to then provide the weighted metric value to the combining nodes of the first one of the sequence of the hidden layers;” (Xu discloses in sec. III para 6, “During each iteration, based on the range and distribution of the weights of the links in the DNN, we define a threshold for each hidden node (i.e., a node belonging to layer l, where 1 < l < k). A threshold α of a node d is chosen such that only major contributors of node d (i.e., nodes from a previous layer with strong links to node d) could be added into a set of contributing nodes                                 
                                    Γ
                                
                             for node d. Let s and d be the source node and destination node of a link ϛ, respectively, and w(ϛ) be the connection weight of the link. If the absolute value of w(ϛ) is greater than or equal to α, then the contributing node s is added into                                 
                                    Γ
                                
                             for node d. Consequently, if a source node of a link does not belong to the set of contributing nodes for node d, that link must be a weak link; thus, it shall be deleted from the graph.”). 
“operate the combining nodes of the first one of the sequence of the hidden layers using weights that are assigned thereto to multiply and combine weighted metric values provided by the input nodes to generate combined metric values, and when the combined metric value generated by one of the combining nodes exceeds a firing threshold assigned to the combining node to then provide the combined metric value to the combining nodes of a next one of the sequence of the hidden layers;” (Xu discloses in sec. III para 6, “During each iteration, based on the range and distribution of the weights of the links in the DNN, we define a threshold for each hidden node (i.e., a node belonging to layer l, where 1 < l < k). A threshold α of a node d is chosen such that only major contributors of node d (i.e., nodes from a previous layer with strong links to node d) could be added into a set of contributing nodes                                 
                                    Γ
                                
                             for node d. Let s and d be the source node and destination node of a link ϛ, respectively, and w(ϛ) be the connection weight of the link. If the absolute value of w(ϛ) is greater than or equal to α, then the contributing node s is added into                                 
                                    Γ
                                
                             for node d. Consequently, if a source node of a link does not belong to the set of contributing nodes for node d, that link must be a weak link; thus, it shall be deleted from the graph.”).
“operate the combining nodes of a last one of the sequence of hidden layers using weights that are assigned thereto to multiply and combine the combined metric values provided by a plurality of combining nodes of a previous one of the sequence of hidden layers to generate combined metric values, and when the combined metric value generated by one of the combining nodes exceeds a firing threshold assigned to the combining node to then provide the combined metric value to the output node of the output layer;” (Xu discloses in sec. III para 6, “During each iteration, based on the range and distribution of the weights of the links in the DNN, we define a threshold for each hidden node (i.e., a node belonging to layer l, where 1 < l < k). A threshold α of a node d is chosen such that only major contributors of node d (i.e., nodes from a previous layer with strong links to node d) could be added into a set of contributing nodes                                 
                                    Γ
                                
                             for node d. Let s and d be the source node and destination node of a link ϛ, respectively, and w(ϛ) be the connection weight of the link. If the absolute value of w(ϛ) is greater than or equal to α, then the contributing node s is added into                                 
                                    Γ
                                
                             for node d. Consequently, if a source node of a link does not belong to the set of contributing nodes for node d, that link must be a weak link; thus, it shall be deleted from the graph.”). 
“and operate the output node of the output layer to combine the combined metric values provided by the combining nodes of the last one of the sequence of hidden layers to generate the output value used for determining the error value that is correlated to the time sequence indicator in the ordered series.” (Xu discloses an output node in sec. V. B para 1, “In this selected neural network, the input layer contains 15 neurons, representing the predefined input features for house value assessments, and a single output neuron representing the assessed house value or predicted selling price. […] Finally, a single neuron in the last layer represents the assessed house value or predicted selling price.” Xu discloses that the back-propagation algorithm is used for training which involves calculation of an error value based on the value of the output node in sec. V. C para 1, “We set up suitable hyper-parameters for the structured DNN, and trained it using standard feedforward backpropagation algorithm with problem-specific real-time training and fitting techniques.”). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 9:
Bhalla in view of Xu teaches “The network management computer system of Claim 1” as seen above. 
Xu further teaches: 
“wherein an operation to provide to the input nodes of the neural network circuit the forecasted performance metrics and the measured performance metrics, comprises:  23Attorney Docket 1100-180143/US20180143” (Xu discloses input nodes receiving metrics in sec. V. B para 1, “ In this selected neural network, the input layer contains 15 neurons, representing the predefined input features for house value assessments.” Xu discloses the neural network receiving forecasted and measured metrics in Fig. 1, shown below:

    PNG
    media_image5.png
    452
    588
    media_image5.png
    Greyscale

Bhalla further teaches: 
 “combine a plurality of the measured performance metrics at time sequence indicators earlier than a present time sequence indicator to generate an aggregated measured performance metric;” (Bhalla discloses a network node performance analytic record (NPAR) that aggregates performance metrics that were captured in real-time and are associated by date and time which are time sequence indicators. [Bhalla 38: 1-11: “At block 410, the performance extractor 212 of the predictive analytics server 110 may aggregate performance metrics for network nodes in the network infrastructure to create a network node performance analytic record (NPAR). […] According to an example, the performance extractor may aggregate performance metrics from a predetermined time frame at least 48 hours prior to an occurrence of the fail condition for the network nodes.”]). 
“and providing the aggregated measured performance metric to the neural network circuit as one of the measured performance metrics.” (Bhalla discloses providing NPAR data to a neural network. [Bhalla 38: 8-11: “According to an example, the performance extractor may aggregate performance metrics from a predetermined time frame at least 48 hours prior to an occurrence of the fail condition for the network nodes.”, 38: 5-8: “For example, the created NPAR may be a dataset that includes all the necessary performance metrics to be input into a predictive model that is trained to forecast an occurrence of a fail condition for network nodes.”, 38: 14-16: “The performance metrics in the NPAR may be divided by date (e.g., days) for each node to enable the performance metrics to be easily applied to model training.”, 16: 2-5: “The plurality of models refers to any predictive model that may be used to forecast a probability of an outcome such as, for example, a decision tree model, a neural network model”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 11:
Bhalla in view of Xu teaches “The network management computer system of Claim 1” as seen above. 
Bhalla further teaches: 
“wherein the at least one processor is further configured to: (Bhalla teaches the network node predictive system made up of a network node manager and provisioning server including processors. [Bhalla 29: 1-7: “The processor 202, which may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), or the like, is to perform various processing functions in the predictive analytics server 110. In an example, the network node manager 210 includes machine readable instructions stored on a non-transitory computer readable medium 213 and executable by the processor 202.”, 34: 3-11: “The provisioning server 115 is depicted as including a processor […] In an example, the provisioning server 115 includes machine readable instructions stored on a non-transitory computer readable medium 263 and executable by the processor 252.” ]).  
combine a plurality of the measured performance metrics in a stream during operation of the communication network to generate an aggregated measured performance metric;”  (Bhalla discloses a network node performance analytic record (NPAR) that aggregates performance metrics that were captured in real-time and are associated by date and time which are time sequence indicators. [Bhalla 38: 1-11: “At block 410, the performance extractor 212 of the predictive analytics server 110 may aggregate performance metrics for network nodes in the network infrastructure to create a network node performance analytic record (NPAR). […] According to an example, the performance extractor may aggregate performance metrics from a predetermined time frame at least 48 hours prior to an occurrence of the fail condition for the network nodes.”]).
“and control operation of the communication network based on output of the output node of the neural network circuit while processing through the input nodes of the neural network circuit the aggregated measured performance and forecasted aggregate performance metric.” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
Xu further teaches:
“generate a forecasted aggregate performance metric based on extrapolating from a series of aggregated measured performance metrics in the stream during earlier operation of the communication network;” (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 15:
Bhalla in view of Xu teaches “The network management computer system of Claim 1” as seen above. 
Bhalla further teaches: 
“wherein the operation to control operation of the communication network based on output of the output node of the neural network circuit while processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network,” (**Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
“further comprises: communicating an alert notification toward an operator console which indicates that an identified network node has an operational fault, responsive to the measured performance metrics characterizing operation of the identified network node and the output of the output node of the neural network circuit 25Attorney Docket 1100-180143/US20180143 indicating at least a threshold likelihood of a fault in operation of the identified network node.” (Bhalla teaches communicating an alert when a fault is identified. [Bhalla 35: 8-15: “Also, the network nodes may be queried to determine information related to the quality or a health of network node and generate alerts to network administrators if a network element is determined to be failing or operating in error. According to an example, the discovery engine 262 may receive a prediction of a fail condition for failing network nodes from the forecasting engine 216 of the predictive analytics server 110.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 16:
	Bhalla teaches a system for predicting network node failure in a communication network. Bhalla teaches:
“A computer program product comprising: a non-transitory computer readable storage medium having computer readable program code stored in the medium and when executed by at least one processor of a network management computer system causes the network management computer system to perform operations comprising:” (Bhalla teaches the network node predictive system made up of a network node manager and provisioning server including processors. [Bhalla 29: 1-7: “The processor 202, which may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), or the like, is to perform various processing functions in the predictive analytics server 110. In an example, the network node manager 210 includes machine readable instructions stored on a non-transitory computer readable medium 213 and executable by the processor 202.”, 34: 3-11: “The provisioning server 115 is depicted as including a processor […] In an example, the provisioning server 115 includes machine readable instructions stored on a non-transitory computer readable medium 263 and executable by the processor 252.” ]).  
“accessing a network metrics repository to retrieve performance metrics that are measured during operation of a communication network, and to retrieve fault values which indicate whether defined types of network operation faults have occurred,” (Bhalla discloses a network node performance analytic record (NPAR) that stores performance metrics that were captured in real-time and fault values that is accessed by a processor. [Bhalla Fig. 2A, ¶14: lines 15-20: “A network infrastructure may include a set of network nodes providing voice and data services over a network to customer premises. A service as used herein may include Supplying or providing information over a network, and is also referred to as a communications network service”, ¶21: lines 7-15: “The predictive model of the disclosed examples may process the vast amount of data that is captured from thousands of different sources in approximately real-time to provide the network provider with a prediction of a network node failure at least 48 hours in advance. For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real-time”, ¶41: lines 1-6: “In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.”])   
wherein the communication network comprises a plurality of network nodes that receive and forward communication packets,”  (Bhalla teaches a communication network comprised of nodes that receives and forwards information which is what packets consist of. [Bhalla ¶14: 7-10: “For example, a network node may be a communication device that is capable of sending, receiving, or forwarding information over a communications channel in a network.”]).
“and controlling operation of the communication network based on output of the output node of the neural network circuit, the output node providing the output responsive to processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network, wherein an operation to control operation of the communication network comprises at least one of shifting communication packet traffic away from one of the network nodes toward one or more other ones of the network nodes or communicating a command to one of the network nodes instructing the network node to reboot at least a portion of executable operation code of the network node responsive to the measured performance metrics characterizing operation of the one of the network nodes and the output of the output node of the neural network circuit indicating at least a threshold likelihood of a fault in operation of the one of the network nodes.” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
However, Bhalla does not explicitly teach the specifics of the operation of the neural network. Bhalla does not teach rest of the claimed limitations. 
	Xu teaches:
“generating forecasted performance metrics based on extrapolating from the measured performance metrics; (Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale


providing to input nodes of a neural network circuit the forecasted performance metrics and the measured performance metrics;” (Xu discloses providing to the input of the neural network the forecasted metrics and measured metrics in Fig. 1, shown highlighted below:

    PNG
    media_image2.png
    452
    588
    media_image2.png
    Greyscale
 
“adapting weights and/or firing thresholds that are used by at least the input nodes of the neural network circuit real-time feedback of an output value of the output node of the neural network to reduce an error value based on comparison of the output value of the output node to at least one of the fault values from the network metrics repository;” (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.” Xu discloses that the neural network is responsive to real-time feedback in sec. IV. B para 2, “ Each time when new labeled data points are added into the data pool, the structured DNN is incrementally trained and learns from the new data points in real time.” Xu discloses increasing accuracy (which is equivalent to reducing error) based on a comparison between the output value of the output node and the values from data, shown highlighted below in Algorithm 1: 

    PNG
    media_image3.png
    404
    581
    media_image3.png
    Greyscale
). 
Bhalla, Xu, and the instant application are analogous art because they are directed to prediction systems using neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla to include the specifics of neural network operation, as taught by Xu. One would be motivated to do so to improve upon conventional fully-connected neural networks, as suggested by Xu (Xu Abstract: “The experimental results show that a structured DNN outperforms conventional multivariate linear regression models, fully-connected neural networks, and prediction methods used by the leading real estate companies.”). 

Regarding Claim 17:
Bhalla in view of Xu teaches “The computer program product of Claim 16” as seen above. 
Bhalla further teaches:
“wherein the performance metrics are measured during operation of the communication network and are correlated to time sequence indicators for defined types of network operation performance characteristics, the fault values are correlated to the time sequence indicators and indicate whether defined types of network operation faults have occurred, and the operations by the at least one processor executing the computer readable program code further comprise:” (Bhalla discloses a network node performance analytic record (NPAR) that stores performance metrics that were captured in real-time and are associated by date which are time sequence indicators. [Bhalla ¶38: 1-11: “At block 410, the performance extractor 212 of the predictive analytics server 110 may aggregate performance metrics for network nodes in the network infrastructure to create a network node performance analytic record (NPAR).”, ¶21: 11-14: “For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real time.”, ¶38: 14-15: “The performance metrics in the NPAR may be divided by date (e.g., days)”]. The NPAR further stores fault values that are labelled with their corresponding date. [Bhalla ¶41: lines 1-6: “In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.” and Fig. 5]).
“and controlling operation of the communication network based on further output of the output node of the neural network circuit, the output node providing the further output responsive to processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network.” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
Xu further teaches:
“repeating operations for an ordered series of the time sequence indicators to:”26Attorney Docket 1100-180143/US20180143 (Xu discloses repeating training in Algorithm 1, shown highlighted below: 

    PNG
    media_image6.png
    404
    581
    media_image6.png
    Greyscale

“for at least one of the defined types of network operation performance characteristics, generate a forecasted performance metric based on extrapolating from a sequence of the measured performance metrics retrieved from the network metrics repository that are for the type of network operation performance characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series;” (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
“provide to the input nodes of the neural network circuit the forecasted performance metrics and the measured performance metrics that are correlated to the time sequence indicator in the ordered series;” (Xu discloses providing to the input of the neural network the forecasted metrics and measured metrics which are associated with time indicators in Fig. 1, shown highlighted below:

    PNG
    media_image2.png
    452
    588
    media_image2.png
    Greyscale
 
“determine the error value based on comparison of an output value of the output node of the neural network circuit to at least one of the fault values from the network metrics repository that is correlated to the time sequence indicator in the ordered series;” (Xu discloses increasing accuracy (which is equivalent to reducing error) based on a comparison between the output value of the output node and the values from data, shown highlighted below in Algorithm 1: 

    PNG
    media_image3.png
    404
    581
    media_image3.png
    Greyscale
 
“and adapt weights and/or firing thresholds, which are used by at least the input nodes of the neural network circuit to generate outputs to the combining nodes of a first one of the sequence of the hidden layers, to reduce the error value;” (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 16. 

Regarding Claim 20:
	Bhalla teaches a system for predicting network node failure in a communication network. Bhalla teaches: 
“A method by a network management computer system comprising: accessing a network metrics repository to retrieve performance metrics that are measured during operation of a communication network,” (Bhalla discloses a network node performance analytic record (NPAR) that stores performance metrics that were captured in real-time and fault values that is accessed by a processor. [Bhalla ¶14: lines 15-20: “A network infrastructure may include a set of network nodes providing voice and data services over a network to customer premises. A service as used herein may include Supplying or providing information over a network, and is also referred to as a communications network service”, ¶21: lines 7-15: “The predictive model of the disclosed examples may process the vast amount of data that is captured from thousands of different sources in approximately real-time to provide the network provider with a prediction of a network node failure at least 48 hours in advance. For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real-time”]). 
“and to retrieve fault values which indicate whether defined types of network operation faults have occurred, wherein the communication network comprises a plurality of network nodes that receive and forward communication packets;” (The NPAR further stores fault values that are labelled with their corresponding date. [Bhalla Fig. 5 and ¶41: lines 1-6: “In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.”])  
“and controlling operation of the communication network based on further output of the output node of the neural network circuit, the output node providing the further output responsive to processing through the input nodes of the neural network circuit a stream of measured performance metrics and forecasted performance metrics that are obtained during operation of the communication network, wherein an operation to control operation of the communication network comprises at least one of shifting communication packet traffic away from one of the network nodes toward one or more other ones of the network nodes or communicating a command to one of the network nodes instructing the network node to reboot at least a portion of executable operation code of the network node responsive to the measured performance metrics characterizing operation of the one of the network nodes and the output of the output node of the neural network circuit indicating at least a threshold likelihood of a fault in operation of the one of the network nodes.” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
However, Bhalla does not explicitly teach the specifics of the operation of the neural network. Bhalla does not teach rest of the claimed limitations. 
	Xu teaches:
“generating forecasted performance metrics based on extrapolating from the measured performance metrics;” 28Attorney Docket 1100-180143/US20180143 (Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale

“providing to input nodes of a neural network circuit the forecasted performance metrics and the measured performance metrics;” (Xu discloses providing to the input of the neural network the forecasted metrics and measured metrics in Fig. 1, shown highlighted below:

    PNG
    media_image2.png
    452
    588
    media_image2.png
    Greyscale
 
“adapting weights and/or firing thresholds that are used by at least the input nodes of the neural network circuit responsive to real-time feedback of an output value of the output node of the neural network to reduce an error value based on comparison of the output value of the output node to at least one of the fault values from the network metrics repository;” (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.” Xu discloses that the neural network is responsive to real-time feedback in sec. IV. B para 2, “ Each time when new labeled data points are added into the data pool, the structured DNN is incrementally trained and learns from the new data points in real time.” Xu discloses increasing accuracy (which is equivalent to reducing error) based on a comparison between the output value of the output node and the values from data, shown highlighted below in Algorithm 1: 

    PNG
    media_image3.png
    404
    581
    media_image3.png
    Greyscale
). 
Bhalla, Xu, and the instant application are analogous art because they are directed to prediction systems using neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla to include the specifics of neural network operation, as taught by Xu. One would be motivated to do so to improve upon conventional fully-connected neural networks, as suggested by Xu (Xu Abstract: “The experimental results show that a structured DNN outperforms conventional multivariate linear regression models, fully-connected neural networks, and prediction methods used by the leading real estate companies.”). 

Regarding Claim 21:
Bhalla in view of Xu teach “The network management computer system of Claim 1,” as seen above. 
Xu further teaches: 
wherein the measured performance metrics are received at different rates, (Xu discloses in Fig. 1 that there is real time and non-real time data received. Examiner notes that the real time data would be received at a higher rate than the non-real time data. 

    PNG
    media_image7.png
    452
    588
    media_image7.png
    Greyscale

and wherein the statistical representation is provided to the input nodes at a lower rate than the higher rate. (Examiner notes that combination of Bhalla and Xu teach this limitation. Xu discloses that the real time data is pre-processed in sec. IV. B para 1, “The framework for real-time training of a structured DNN is illustrated in Fig. 1. The first step is to collect the training and test data points. As the collected data are raw data that usually contain a lot of unnecessary information, we need to preprocess them and retrieve the needed fields in a desired format. In addition, data points with missing information or wrong information could negatively affect the training results of a neural network, such data points are considered outliers, and thus they are removed from the training and test datasets.” Examiner notes that the real time data (higher rate data, as stated above) is provided to the input nodes at a lower rate due to the pre-processing which requires time.) 
Bhalla teaches:
wherein higher rate measured performance metrics received at a higher rate are combined to create a statistical representation of the higher rate measured performance metrics, (Examiner notes that statistical representation is not well defined. Bhalla discloses that real time data is mapped into a NPAR in para 21, “¶21: lines 7-15: “The predictive model of the disclosed examples may process the vast amount of data that is captured from thousands of different sources in approximately real-time to provide the network provider with a prediction of a network node failure at least 48 hours in advance. For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real-time. A model constructor of the predictive analytics server may then select input variables that are derived from the most current and updated data in NPAR to train each of a plurality of models. ” Examiner notes that the real time data is combined into a NPAR and that the variables representing the data is the statistical representation.) 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Regarding Claim 22:
Bhalla in view of Xu teach “The network management computer system of Claim 1,” as seen above. 
Xu further teaches: 
wherein the forecasted performance metrics are generated at different rates based on the measured performance characteristics, (Xu discloses in Fig. 1 that there is real time and non-real time data received. Examiner notes that the forecasted data for real time and non-real time data would be generated at different rates. 

    PNG
    media_image8.png
    452
    588
    media_image8.png
    Greyscale

and wherein the statistical representation is provided to the input nodes at a lower rate than the higher rate. (Examiner notes that combination of Bhalla and Xu teach this limitation. Xu discloses that the data is pre-processed in sec. IV. B para 1, “The framework for real-time training of a structured DNN is illustrated in Fig. 1. The first step is to collect the training and test data points. As the collected data are raw data that usually contain a lot of unnecessary information, we need to preprocess them and retrieve the needed fields in a desired format. In addition, data points with missing information or wrong information could negatively affect the training results of a neural network, such data points are considered outliers, and thus they are removed from the training and test datasets.” Examiner notes that the data is provided to the input nodes at a lower rate due to the pre-processing which requires time.) 
Bhalla teaches:
wherein higher rate forecasted performance metrics generated at a higher rate are combined to create a statistical representation of the higher rate forecasted performance metrics, (Examiner notes that statistical representation is not well defined. Bhalla discloses that real time data is mapped into a NPAR in para 21, “¶21: lines 7-15: “The predictive model of the disclosed examples may process the vast amount of data that is captured from thousands of different sources in approximately real-time to provide the network provider with a prediction of a network node failure at least 48 hours in advance. For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real-time. A model constructor of the predictive analytics server may then select input variables that are derived from the most current and updated data in NPAR to train each of a plurality of models.” Bhalla further discloses that the forecasted real time performance metrics are also stored in the NPAR in para 40-41, “At block 420, the forecasting engine 216 of the predictive analytics server 110, for instance, may define the fail condition for the network nodes. […] In a disclosed example, an outage flag may be used to record the fail condition. Referring to FIG. 5, the performance extractor 212, for instance, may create a fail condition table 500 in the NPAR that tracks a fail condition history for each node (e.g., node 1, node 2, node 3, etc.) in the network infrastructure.” Examiner notes that the forecasted data is combined into a NPAR and that the variables representing the data is the statistical representation.) 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Bhalla with the teachings of Xu for at least the same reasons as discussed above in claim 1. 

Claims 3-4, 8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalla in view of Xu and further in view of Cote et al. (US5461699A) (herein thereafter Cote). 

Regarding Claim 3:
Bhalla in view of Xu teaches “The network management computer system of Claim 2” as seen above. 
Xu further teaches: 
 “wherein the at least one processor is further configured to: […] identify parameters of a mathematical relationship forming a trend through a historical sequence of the measured performance metrics in the network metrics repository that are correlated to the time sequence indicators in the ordered series that start 20Attorney Docket 1100-180143/US20180143 before and continue to an occurrence of the one of the defined types of the network operation faults,” (Xu discloses identifying parameters (i.e. weights) in sec. III para 5, “Then we design a fully-connected DNN with k layers and nl neurons in each layer, where                                 
                                    1
                                    ≤
                                    l
                                    ≤
                                    k
                                
                            , and nl and nk are the number of predefined input features and the number of outputs, respectively. After training the model, we identify the links with weak weights that are below a predefined threshold.
We call such links weak links, which contribute less to the next layer than a normal link. As such, weak links can be safely deleted from the DNN. Similarly, we identify the neurons with very few links, called weak neurons. Such neurons can be either deleted or combined into a stronger one.”)
“wherein, for at least one of the defined types of network operation performance characteristics that correlate to a time sequence indicator at an occurrence of the one of the defined types of the network operation faults, a forecasted performance metric is generated using the parameters of the mathematical relationship to extrapolate from a sequence of the measured performance metrics in the network metrics repository that are for the type of network operation performance characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series at the occurrence of the one of the defined types of the network operation faults.” (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
However, neither Bhalla nor Xu teaches performing the functions cited above “for one of the defined types of the network operation faults.”
Cote teaches a system for predicting abnormal behavior in networks. [Cote Abstract: “A system to predict events in a telecommunications network includes a processor ; and memory storing instructions that , when executed , cause the processor to , responsive to obtained Performance Monitoring ( PM ) data over time from the telecommunications network”]. Cote teaches: 
“for one of the defined types of the network operation faults,” (Examiner notes that in the instant specifications, one of the defined types of network operation faults is bit error rate. Cote teaches that one of the performance metrics used is error rate. [Cote 43: 1-12: “Examples of PM data include , without limitation , optical layer data , packet layer data , service and traffic layer data , alarms , hardware operating metrics , etc. […] The packet layer data can include port level information such as bandwidth , throughput , latency , jitter , error rate , RX bytes / packets , TX bytes / packets , dropped packet bytes , etc.”]). 
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Cote, and the instant application are analogous art because they are directed to network failure prediction systems that use neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu so that  “for one of the defined types of the network operation faults, identify parameters of a mathematical relationship forming a trend through a historical sequence of the measured performance metrics in the network metrics repository that are correlated to the time sequence indicators in the ordered series that start 20Attorney Docket 1100-180143/US20180143 before and continue to an occurrence of the one of the defined types of the network operation faults,” as taught by Cote. One would be motivated to do so to increase efficiency by reducing the size of the input data used for training ML models in the predictive system, as suggested by Cote (Cote 66: 1-6: “the process 52 includes , for each time bin , reducing a PM to a single number representing the prob ability of being normal ( or “ p - value ” ) of the device / service / application that is being monitored . This transforms the n - dimensional time - series into a 1 - dimensional distribution , which is much easier to model”). 

Regarding Claim 4:
Bhalla in view of Xu and Cote teach “The network management computer system of Claim 3” as seen above. 
Xu further teaches: 
 “wherein the at least one processor is further configured to: […] identify parameters of another mathematical relationship forming a trend through a historical sequence of the measured performance metrics in the network metrics repository that are correlated to the time sequence indicators in the ordered series that start before and continue to an occurrence of the another one of the defined types of the network operation faults,” (Xu discloses identifying parameters (i.e. weights) in sec. III para 5, “Then we design a fully-connected DNN with k layers and nl neurons in each layer, where                                 
                                    1
                                    ≤
                                    l
                                    ≤
                                    k
                                
                            , and nl and nk are the number of predefined input features and the number of outputs, respectively. After training the model, we identify the links with weak weights that are below a predefined threshold. We call such links weak links, which contribute less to the next layer than a normal link. As such, weak links can be safely deleted from the DNN. Similarly, we identify the neurons with very few links, called weak neurons. Such neurons can be either deleted or combined into a stronger one.”)
“wherein, for at least one of the defined types of network operation performance characteristics that correlate to a time sequence indicator at an occurrence of the another one of the defined types of the network operation faults, a forecasted performance metric is generated using the parameters of the another mathematical relationship to extrapolate from a sequence of the measured performance metrics in the network metrics repository that are for the type of network operation performance characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series at the occurrence of the another one of the defined types of the network operation faults.”  (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
However, neither Bhalla nor Xu teaches performing the functions cited above “for one of the defined types of the network operation faults.”
Cote teaches:
“for another one of the defined types of the network operation faults,” (Examiner notes that in the instant specifications, one of the defined types of network operation faults is dropped packet rate. Cote teaches that one of the performance metrics used is dropped packet bytes and in-service time which make up dropped packet rate. [Cote 43: 1-23: “Examples of PM data include , without limitation , optical layer data , packet layer data , service and traffic layer data , alarms , hardware operating metrics , etc. […] The packet layer data can include port level information such as bandwidth , throughput , latency , jitter , error rate , RX bytes / packets , TX bytes / packets , dropped packet bytes , etc. […] The hardware operating metrics can include temperature , memory usage , in - service time , etc.”]).  
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system Bhalla combined with Xu with the teachings of Cote for at least the same reasons as discussed above in claim 3. 

Regarding Claim 8:
	Bhalla in view of Xu teaches “The network management computer system of Claim 2” as seen above. 
	Bhalla further teaches:
“wherein the communication network comprises at least one network node that receives and forwards communication packets” (Bhalla teaches a communication network comprised of nodes that receives and forwards information which is what packets consist of. [Bhalla ¶14: 7-10: “For example, a network node may be a communication device that is capable of sending, receiving, or forwarding information over a communications channel in a network.”]).
However, neither Bhalla nor Xu teach the further elements of claim 8.
Cote teaches:
“the defined types of network operation performance characteristics comprise at least two of the following: network node input buffer memory utilization; network node output buffer memory utilization; network node input packet traffic bit error rate; network node output packet traffic bit error rate; network node input traffic dropped packet rate; network node output traffic dropped packet rate; network node processor utilization; network node code memory utilization; network node packet processing memory utilization; and network communication latency.” (Cote teaches at least two of the elements including network communication latency and network node dropped packet rate. Cote teaches that two of the performance metrics used is dropped packet bytes and in-service time which make up dropped packet rate. [Cote 43: 1-23: “Examples of PM data include , without limitation , optical layer data , packet layer data , service and traffic layer data , alarms , hardware operating metrics , etc. […] The packet layer data can include port level information such as bandwidth , throughput , latency , jitter , error rate , RX bytes / packets , TX bytes / packets , dropped packet bytes , etc. […] The hardware operating metrics can include temperature , memory usage , in - service time , etc.”]).  
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Cote, and the instant application are analogous art because they are directed to network failure prediction systems that use neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu so that  “for one of the defined types of the network operation faults, identify parameters of a mathematical relationship forming a trend through a historical sequence of the measured performance metrics in the network metrics repository that are correlated to the time sequence indicators in the ordered series that start 20Attorney Docket 1100-180143/US20180143 before and continue to an occurrence of the one of the defined types of the network operation faults,” as taught by Cote. One would be motivated to do so to increase efficiency by reducing the size of the input data used for training ML models in the predictive system, as suggested by Cote (Cote 66: 1-6: “the process 52 includes , for each time bin , reducing a PM to a single number representing the prob ability of being normal ( or “ p - value ” ) of the device / service / application that is being monitored . This transforms the n - dimensional time - series into a 1 - dimensional distribution , which is much easier to model”). 

Regarding Claim 18:
Bhalla in view of Xu teaches “The computer program product of Claim 17” as seen above. 
Bhalla further teaches: 
“wherein the operations by the at least one processor executing the computer readable program code further comprise:” (Bhalla discloses in a processor executing computer readable instructions in para 29 lines 1-7: “The processor 202, which may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), or the like, is to perform various processing functions in the predictive analytics server 110. In an example, the network node manager 210 includes machine readable instructions stored on a non-transitory computer readable medium 213 and executable by the processor 202.”). 
Xu further teaches: 
“identifying parameters of a mathematical relationship forming a trend through a historical sequence of the measured performance metrics retrieved from the network metrics repository that are correlated to the time sequence indicators in the ordered series that start before and continue to an occurrence of the one of the defined types of the network operation faults,”27Attorney Docket 1100-180143/US20180143 (Xu discloses that each value of a node is associated with a weight in sec. III para 5, “. Then we design a fully-connected DNN with k layers and nl neurons in each layer, where                                 
                                    1
                                    ≤
                                    l
                                     
                                    ≤
                                    k
                                     
                                
                            and nl and nk are the number of predefined input features and the number of outputs, respectively. After training the model, we identify the links with weak weights that are below a predefined threshold. We call such links weak links, which contribute less to the next layer than a normal link.” Xu discloses that the initial weights are assigned in sec. IV. B para 2, “Before starting the training process, the structured DNN is initialized with random weights.” ). 
“wherein, for at least one of the defined types of network operation performance characteristics that correlate to a time sequence indicator at an occurrence of the one of the defined types of the network operation faults, generating a forecasted performance metric using the parameters of the mathematical relationship to extrapolate from a sequence of the measured performance metrics in the network metrics repository that are for the type of network operation performance characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series at the occurrence of the one of the defined types of the network operation faults.”  (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
Cote teaches: 
“for one of the defined types of the network operation faults,” (Examiner notes that in the instant specifications, one of the defined types of network operation faults is dropped packet rate. Cote teaches that one of the performance metrics used is dropped packet bytes and in-service time which make up dropped packet rate. [Cote 43: 1-23: “Examples of PM data include , without limitation , optical layer data , packet layer data , service and traffic layer data , alarms , hardware operating metrics , etc. […] The packet layer data can include port level information such as bandwidth , throughput , latency , jitter , error rate , RX bytes / packets , TX bytes / packets , dropped packet bytes , etc. […] The hardware operating metrics can include temperature , memory usage , in - service time , etc.”]).  
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Cote, and the instant application are analogous art because they are directed to network failure prediction systems that use neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu so that  “for one of the defined types of the network operation faults, identifying parameters of a mathematical relationship forming a trend through a historical sequence of the measured performance metrics retrieved from the network metrics repository that are correlated to the time sequence indicators in the ordered series that start before and continue to an occurrence of the one of the defined types of the network operation faults,” as taught by Cote. One would be motivated to do so to increase efficiency by reducing the size of the input data used for training ML models in the predictive system, as suggested by Cote (Cote 66: 1-6: “the process 52 includes , for each time bin , reducing a PM to a single number representing the prob ability of being normal ( or “ p - value ” ) of the device / service / application that is being monitored . This transforms the n - dimensional time - series into a 1 - dimensional distribution, which is much easier to model”). 

Claim 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalla in view of Xu, and further in view of Moreira et al. (“Neural Networks with Adaptive Learning Rate and Momentum Terms”) (herein thereafter Moreira). 

Regarding Claim 6:
Bhalla in view of Xu teach “The network management computer system of Claim 2” as seen above. 
Xu also teaches: 
“wherein the adaptation of weights and/or firing thresholds, which are used by at least the input nodes of the neural network circuit to generate outputs to the combining nodes of a first one of the sequence of the hidden layers, to reduce the error value, comprises:”22Attorney Docket 1100-180143/US20180143 (Xu discloses adapting weights in sec. V. C para 1-2, “Although smaller initialized weights make a neural network learn slower, experimental results show that, with enough available data points, initializing a neural network with smaller weights helps to get better generalization, and hence to achieve better performance. […] To balance the training speed and the stability of the network, a momentum is used to diminish the fluctuations in weight changes.”)
However, neither Bhalla nor Xu teach “determining volatility in a sequence of the measured performance metrics in the network metrics repository that are for one type of network operation performance characteristic and that correlate to some time sequence indicators that precede a present time sequence indicator in an ordered series.”
Moreira teaches the gradient descent method for backpropagation for neural networks. Moreira teaches: 
“determining volatility in a sequence of the measured performance metrics in the network metrics repository that are for one type of network operation performance characteristic and that correlate to at least one of the time sequence indicators that precede a present time sequence indicator in an ordered series;” (Examiner notes that volatility is not defined in the instant specifications and is interpreted as the state of being characterized by or subject to rapid or unexpected change as per the definition in the Merriam Webster Dictionary. Examiner notes that rapid changes in measured performance metrics would lead to rapid changes in the error because the measured performance metrics include the measured fault values which are compared with output of the neural network to calculate error. Thus, volatility in error would have a relationship with volatility of measured performance metrics. Moreira teaches using gradient descent which depends on the gradient (i.e. the volatility) of the error. [Moreira sec. 1.1: page 1: “The minimization of the error function is carried out using a gradient descent technique. The necessary corrections to the weights of the network for each iteration n are obtained by calculating the partial derivative of the error function in relation to each weight wij which gives a direction of steepest descent.”]
“adapting the weights and/or firing thresholds further based on the determined volatility in the sequence of the measured performance metrics.” (Moreira teaches adapting weights based on the gradient of the error. [Moreira sec 1.1: page 1: “The minimization of the error function is carried out using a gradient descent technique. The necessary corrections to the weights of the network for each iteration n are obtained by calculating the partial derivative of the error function in relation to each weight wij which gives a direction of steepest descent.”]). 
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Moreira, and the instant application are analogous art because they are directed to neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu to include the above claimed limitations, as taught by Moreira. One would be motivated to do so to increase efficiency during training of the neural network, as suggested by Moreira (Moreira sec. 2 page 3: “There are two fundamental reasons that justify a study of adaptive learning rate schedules. One is that the amount of weight update can be allowed to adapt to the shape of the error surface at each particular situation. The value of the learning rate should be sufficiently large to allow a fast learning process but small enough to guarantee its effectiveness. […] The other reason is that with automatic adaptation of the learning rate the trial-and-error search for the best initial values for the parameter can be avoided.”). 

Regarding Claim 7:
Bhalla in view of Xu and Moreira teaches “The network management computer system of Claim 6” as seen above. 
Moreira teaches:
“wherein the adaptation of the weights and/or firing thresholds further based on the determined volatility in the sequence of the measured performance metrics, comprises:”  (Moreira teaches adapting weights based on the gradient of the error (see above for further details on volatility and error). [Moreira sec 1.1: page 1: “The minimization of the error function is carried out using a gradient descent technique. The necessary corrections to the weights of the network for each iteration n are obtained by calculating the partial derivative of the error function in relation to each weight wij which gives a direction of steepest descent.”]).
“decreasing a rate of change in the weights and/or firing thresholds further based on the determined volatility increasing; (Moreira teaches that when oscillations in error occur (i.e. high volatility), the momentum parameter decreases the rate of change of weights. [Moreira sec 1.1: page 2: “alpha is the momentum parameter and determines the amount of influence from the previous iteration on the present one. The momentum introduces a “damping” effect on the search procedure, thus avoiding oscillations in irregular areas of the error surface by averaging gradient components with opposite sign and accelerating the convergence in long flat areas. In some situations it possibly avoids the search procedure from being stoped in a local minimum, helping it to skip over those regions without performing any minimization there.”]). 
“and increasing a rate of change in the weights and/or firing thresholds further based on the determined volatility decreasing.” (Moreira teaches accelerating convergence (i.e. increasing rate of change of weights) in flat areas (i.e. no volatility). [Moreira sec 1.1: page 2: “alpha is the momentum parameter and determines the amount of influence from the previous iteration on the present one. The momentum introduces a “damping” effect on the search procedure, thus avoiding oscillations in irregular areas of the error surface by averaging gradient components with opposite sign and accelerating the convergence in long flat areas. In some situations it possibly avoids the search procedure from being stoped in a local minimum, helping it to skip over those regions without performing any minimization there.”]). 
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the combined system of Bhalla in view of Xu with the teachings of Moreira for at least the same reasons as discussed above in claim 6. 

Claims 10, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalla in view of Xu, and further in view of Bengio (“Practical Recommendations for Gradient-Based Training of Deep Architectures”) (herein thereafter Bengio). 

Regarding Claim 10:
Bhalla in view of Xu teaches “The network management computer system of Claim 9” as seen above.
However, neither Bhalla nor Xu teach “wherein a number of the measured performance metrics that are combined to generate the aggregated measured performance metric is determined based an epoch cycle time of the neural network circuit.”
Bengio discloses recommendations for implementing gradient descent for neural networks. Bengio teaches:
“The network management computer system of Claim 9, wherein a number of the measured performance metrics that are combined to generate the aggregated measured performance metric is determined based on an epoch cycle time of the neural network circuit.” (Examiner notes that aggregated performance metrics are only defined by performance metrics that are combined in some way. One way to group (i.e. combine) different parts of a training set are through batches. Bengio discloses choosing an optimal batch size depending on training time. [Bengio sec. 3.1.1: page 9: “The mini-batch size (B in Eq. (1)) is typically chosen between 1 and a few hundreds, e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix matrix products over matrix-vector products. The impact of B is mostly computational, i.e., larger B yield faster computation (with appropriate implementations) but requires visiting more examples in order to reach the same error, since there are less updates per epoch. In theory, this hyper-parameter should impact training time”]). 
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Bengio, and the instant application are analogous art because they are directed to neural networks. 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu to determine the “number of the measured performance metrics that are combined to generate the aggregated measured performance metric” based on “an epoch cycle time of the neural network circuit.”, as taught by Bengio. One would be motivated to do so to optimize training of the neural network by reducing training time, as suggested by Bengio (Bengio sec. 3.1.1 page 9: “The mini-batch size (B in Eq. (1)) is typically chosen between 1 and a few hundreds, e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix-matrix products over matrix-vector products. The impact of B is mostly computational, i.e., larger B yield faster computation […] In theory, this hyper-parameter should impact training time”). 

Regarding Claim 12:
Bhalla in view of Xu teach “The network management computer system of Claim 11” as seen above. 
However, neither Bhalla nor Xu teach “wherein a number of the measured performance metrics in the stream that are combined to generate the aggregated measured performance metric is determined based on an epoch cycle time of the neural network circuit.”
Bengio teaches: 
“wherein a number of the measured performance metrics in the stream that are combined to generate the aggregated measured performance metric is determined based an epoch cycle time of the neural network circuit.” (Examiner notes that aggregated performance metrics are only defined by performance metrics that are combined in some way. One way to group (i.e. combine) different parts of a training set are through batches. Bengio discloses choosing an optimal batch size depending on training time. [Bengio sec. 3.1.1: page 9: “The mini-batch size (B in Eq. (1)) is typically chosen between 1 and a few hundreds, e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix matrix products over matrix-vector products. The impact of B is mostly computational, i.e., larger B yield faster computation (with appropriate implementations) but requires visiting more examples in order to reach the same error, since there are less updates per epoch. In theory, this hyper-parameter should impact training time”]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system Bhalla in view of Xu with the teachings of Bengio for at least the same reasons as discussed above in claim 10. 

Regarding Claim 19:
Bhalla in view of Xu teaches “The computer program product of Claim 16” as seen above. 
Bhalla further teaches: 
“wherein the operations by the at least one processor executing the computer readable program code further comprise:” (Bhalla teaches the network node predictive system made up of a network node manager and provisioning server including computer readable instructions executable by a processor. [Bhalla 29: 1-7: “The processor 202, which may be a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), or the like, is to perform various processing functions in the predictive analytics server 110. In an example, the network node manager 210 includes machine readable instructions stored on a non-transitory computer readable medium 213 and executable by the processor 202.”, 34: 8-11: “In an example, the provisioning server 115 includes machine readable instructions stored on a non-transitory computer readable medium 263 and executable by the processor 252.” ]). 
“combining a plurality of the measured performance metrics in a stream during operation of the communication network to generate an aggregated measured performance metric;” (Bhalla teaches combining measured performance metrics into the NPAR. [Bhalla 38: 1-11: “At block 410, the performance extractor 212 of the predictive analytics server 110 may aggregate performance metrics for network nodes in the network infrastructure to create a network node performance analytic record (NPAR).”, 21: 11-14: “For example, a performance extractor of a predictive analytics server may capture the vast data from the different sources in and map the data into a (NPAR) in real time.”]).
“and controlling operation of the communication network based on output of the output node of the neural network circuit while processing through the input nodes of the neural network circuit the aggregated measured performance and forecasted aggregate performance metric,” (Bhalla discloses a provisioning server that remotely configures communication network nodes depending on the output of the forecasting engine that comprises of a neural network. [Bhalla ¶71: 6-12: “At block 720, the forecasting engine 216 may prioritize a maintenance schedule for each of the network nodes. Accordingly, at block 730, the forecasting engine 216 may implement the provisioning server 115 to remotely configure the network nodes that are likely to reach the fail condition in the near future according to the prioritized maintenance schedule.”]).
Xu further teaches:
“generating a forecasted aggregate performance metric based on extrapolating from a series of aggregated measured performance metrics in the stream during earlier operation of the communication network;” (Examiner notes that the combination of Bhalla and Xu teach this limitation as Bhalla teaches network metrics. Examiner notes that extrapolation is merely predicting by projecting known data as per Merriam Webster Dictionary and using a neural network for prediction is extrapolation. Xu discloses generating forecasted metrics based on extrapolation in Fig. 1, shown highlighted below: 

    PNG
    media_image1.png
    452
    588
    media_image1.png
    Greyscale
 
Xu further discloses that the extrapolation is from a sequence of the measured metrics in the data pool that are for the type of characteristic and that correlate to at least one of the time sequence indicators that precede the time sequence indicator in the ordered series in sec. V. A para 1, “To establish the data pool, we collected real estate data from a leading real estate listings website Zillow.com. This website maintains all recent and past house listings data including house features, market features, public records of houses, neighborhood features, and so on. […] The predefined features include number of beds, number of baths, square footage, lot size, built year, yearly tax, similar houses average sold price, nearby schools average ratings, fireplace, waterfront, number of stories, heating, cooling, patio, and park. [...] We further calculate the average selling price-per-square-feet of houses sold in last 6 months. ”)
Neither Bhalla nor Xu teach “wherein a number of the measured performance metrics in the stream that are combined to generate the aggregated measured performance metric is determined based an epoch cycle time of the neural network circuit.”
Bengio discloses recommendations for implementing gradient descent for neural networks. Bengio teaches:
“wherein a number of the measured performance metrics in the stream that are combined to generate the aggregated measured performance metric is determined based an epoch cycle time of the neural network circuit.” (Examiner notes that aggregated performance metrics are only defined by performance metrics that are combined in some way. One way to group (i.e. combine) different parts of a training set are through batches. Bengio discloses choosing an optimal batch size depending on training time. [Bengio sec. 3.1.1: page 9: “The mini-batch size (B in Eq. (1)) is typically chosen between 1 and a few hundreds, e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix matrix products over matrix-vector products. The impact of B is mostly computational, i.e., larger B yield faster computation (with appropriate implementations) but requires visiting more examples in order to reach the same error, since there are less updates per epoch. In theory, this hyper-parameter should impact training time”]).
The network failure prediction system taught by Bhalla in view of Xu, the teachings of Bengio, and the instant application are analogous art because they are directed to neural networks. . 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the network node failure prediction system of Bhalla in view of Xu to determine the “number of the measured performance metrics that are combined to generate the aggregated measured performance metric” based on “an epoch cycle time of the neural network circuit.”, as taught by Bengio. One would be motivated to do so to optimize training of the neural network by reducing training time, as suggested by Bengio (Bengio sec. 3.1.1 page 9: “The mini-batch size (B in Eq. (1)) is typically chosen between 1 and a few hundreds, e.g. B = 32 is a good default value, with values above 10 taking advantage of the speed-up of matrix-matrix products over matrix-vector products. The impact of B is mostly computational, i.e., larger B yield faster computation […] In theory, this hyper-parameter should impact training time”).

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Knebl et al. (US9439081B1) discloses a system and methods for forecasting network performance and implementing corrective action using key performance indicators (KPIs) and neural networks [Knebl col 1: 40-42: “One aspect of the disclosure provides a computer-implemented method of forecasting wireless network performance. The method comprises receiving historical performance data and baseline data for a cell in a network.”, col 8: 55-61: “The network forecast determination unit 220 may identify relationships between KPIs using machine-learning techniques on one or more KPIs. Machine-learning techniques may include statistical correlation analyses, regression analyses (e.g., multiple regression analyses, linear regression analyses, logistic regression analyses, etc.), neural networks, combinations of the same, or the like.”, Abstract: “network adjustments can be planned and implemented in time to preserve a good customer experience.]. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Somie Park whose telephone number is (571)272-1056. The examiner can normally be reached 9:00am - 5:00pm, Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571)272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SOMIE PARK/Examiner, Art Unit 2126                              
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126