DETAILED ACTION
This action is in response to the claims filed 11/07/2022. Claims 21-23 are new. Claims 10, 18 and 19 are cancelled. Claims 1, 8, 9, 11, 12 and 20 are amended. Claims 1-23 are pending and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 11/07/2022 have been fully considered but they are not persuasive. 
Regarding claim 1 
	Applicant argues that the ‘parallel series’ in the cited art, Claessens, is not the same as the claimed ‘usage summary vectors over multiple time spans that have a first granularity’, further noting that 2d grids are not the same as “usage summary vectors”. 
Examiner argues however that in the context of the art the parallel series is representative of the state of a system that is controlled. The art notes that this can be properties such as the room temperature or power consumption of a system. The power consumption is clearly an indicator for the power used by a system or a usage summary. Further, the 2d grid is a representation of a vector in two dimensions where the index of the vector is time and the property that is observed is the value, in this way the 2d grid is representative of a vector.
Further applicant argues that the input data is not represented as blocks over multiple time spans and points to figure 4 as support, the applicant notes that each of the blocks only span hour 150 to 200 thus not over multiple time spans. Examiner notes that the multiple time spans covered by the blocks can be considered to be time spans of 5 hours for example thus a span of 50 hours ( from 150 to 200 hours) is considered to be 10 different time spans. Furthermore, examiner notes that for each action prediction the “current time” is updated thus the input blocks to the convolutional neural network are updated with new time spans for each new action prediction.
Applicant argues that Claessens does not teach a “second granularity that is coarser than the first granularity”. Examiner notes that Examiner previously relied on Yu to teach this limitation previously presented in claim 10. Examiner notes, while Claessens does not explicitly teach this limitation, Yu makes up for the deficiencies in Claessens. 
Further applicant argues that Claessens describes a convolutional layer which outputs a feature map, and that the feature map can not be the claimed “aggregating the blocks of usage summary vectors” because Claessens describes that the “neural architecture 20 takes as input a state action pair in the form of 2d Grids 12 and returns an approximated values”. Examiner notes that applicant seems to equate the “neural architecture” with the “convolutional layer”. These are not equivalent. The neural architecture itself includes a convolutional layer. The convolutional layer in part processes the input into “feature maps” which are then used by a reinforcement network to output the approximated Q value. Examiner contends that the convolutional operation on the input amounts to aggregation of the input data. This operation alone does not produce a Q-value as suggested by applicant. Examiner refers applicant to ¶0183 of the cited art. 
Further applicant argues that Claessens does not teach “the characteristics including: one or more static profile features having a first time variance attribute; and one or more dynamic profile features have a second time variance attribute different than the first time variance attribute”. Examiner disagrees. The input data provided to the neural network 17 include features which are static in time and time varying, thus each having different time variance attributes.

Claim Objections
Claim 1-9,11-17 and 20-23 objected to because of the following informalities:  
The independent claims use the phrase “time variance attribute”. There is no antecedent basis for the claimed terminology in the specification. The specification merely describes the profile features as being “time invariant” or “change over time” in ¶0020.
In claim 11, the claims use indentation to denote which claim features belong to a “second path”. However, the limitation “a second path of the Machine learning architecture system … the second path including:”, has no limitation immediately following it which are at a higher indentation level. Therefore it is unclear which elements belong to the second path, for the purposed of examination the claimed “profile feature module is interpreted as belonging to the second path. This is made clear in the limitations of claim 12.
Appropriate correction is required.


Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-3, 6, 8-9, 11, 12-14, 17 and 20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Claessens et al. US Document ID US20190019080A1, hereinafter Claessens, further in view in view of Yu et al. “Deep Convolutional Neural Networks with Layer-wise Context Expansion and Attention” hereinafter Yu.

Regarding claim 1
Claessens teaches, In a digital medium action prediction environment, a method implemented by at least one computing device, the method comprising: (¶001 “The present invention relates to methods, controllers and systems for the control of distribution systems like energy distribution systems…Controlling the demand flexibility of energy constrained flexibility (ECF) sources such as an Electric Vehicle, a Heat Pump, or an HVAC system is known based on model predictive control…When applied in a demand response setting, a desired outcome of such a control technique is a proposed power and/or energy to be consumed by any number of devices” predicting a future demand is predicting an action, or the energy consumption of a device.) processing, by the at least one computing device and via a first path of a machine learning architecture system, input data to divide the input data into blocks of usage summary vectors over multiple time spans that have a first granularity; ( ¶0152 “A high dimensional state representation 11 can be used as an input preferably comprising a time-stepped series of aggregated state distributions [6] of either a single (see FIG. 1, reference number 12) or a parallel series (see FIG. 4, reference numbers 12 a-d) of 2D aggregated state distributions 12” as shown in figure 4 multiple parallel time series of blocks each of the multiple time series spans a length of time, this time series is a vector describing the usage of an entity, and provided as input to a CNN layer or first path of a machine learning system. Each series is a first block having multiple time spans of a given first granularity) generating, by the at least one computing device and via the first path of the machine learning architecture system, a summary of actions over a time span from the input data by aggregating the blocks of usage summary vectors ( ¶0152 “A convolutional neural network architecture 2” a CNN aggregates the vectors…0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  
    PNG
    media_image1.png
    27
    126
    media_image1.png
    Greyscale
”) determining, by the at least one computing device and via the first path  of the machine learning architecture system, long range interactions across different time frames from the summary of actions using a second neural network of the first path; ( ¶0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  …Multiple layers can be stacked to obtain a deep architecture. Convolutional layers can be alternated optionally with pooling layers that down sample their inputs to introduce an amount of translation invariance into the network….These features extracted in the convolutional neural network 14 are then used as input by higher network layers in a fully connected neural network 15” ¶0095-0096 “A pooling layer could be added to the network. Pooling introduces translation invariance and works well for object detection, but it comes at the cost of losing location information. One option would be to extend pooling over the time dimension….LSTM (Long Short-Term Memory) layers can also be used. The LSTMs would then be responsible for learning time dependencies.”  The convolutional layer can be augmented with a pooling layer, the pooling layer can alternatively be a LSTM. And LSTM which processes the output h reveals long range interactions across different timeframes.) obtaining, by the at least one computing device, a profile from a second path of the machine-learning network architecture, the profile describing characteristics of an entity associated with the actions; (¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:…¶0164 “A control action for each TCL is denoted in this embodiment as a binary value indicating if the TCL is in an OFF/ON state:” ¶0166 “As this state vector only comprises observable state information, e.g. the operational temperature”) the characteristics including One or more static profile features having a first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:… ¶0161 The controllable state information xphys,k relates to a parameter that is to be controlled, e.g. graph 11 in FIG. 1 or 4, and to be kept between upper and lower bounds…where Tk i and Tk i denote the lower and upper bounds for T which can be set by an end user” the bounds define static profile variables that is unChanged over any time frame. Further the controllable state information is information which is held static and not time varying. Because the state information is unchanging its variance attribute is that is time invariant as described in the specification ¶020.) One or more dynamic profile features have a second time variance attribute different than the first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, … ¶0159 The time-dependent information component Xt contains information 11 related to time, i.e. in time steps of, for example, nanoseconds, microseconds, milliseconds, seconds, minutes, days, months” the time dependent information are dynamic profiles variables having time varying attributes which is different than the first time variance attribute.) generating, by the at least one computing device, an input for a third neural network by concatenating the determined long range interactions across the different time frames from a second neural network of the first path of the machine-learning network architecture and the obtained profile from the second path of the machine-learning network architecture; and generating, by the at least one computing device, a prediction of an action by the third neural network by providing the input to the third neural network and receiving the prediction of the action as output from the third neural network (¶0185 “This hidden representation is then combined with the output of the convolutional neural network 14 and the outputs of both networks 14, 17 are merged into fully connected layers 15. A final linear output layer 19 maps the combined hidden features to the predicted Q-value 18 of the input state-action pair.” ¶0184 “step 3 includes real time control whereby the control action resulting from the policy h described above, is to be converted into a product to be dispatched such as energy to be dispatched to the different devices” The outputs of both paths are fed into the final layer 19, the profile path is processed through a network 17, to be then input to the third Neural network. As mentioned previously the output of the CNN is augmented with an LSTM layer and embodies spatial temporal features, the output from this path is also fed to the third neural network. Finally the third neural network outputs the predicted q value for a given predicted control action.)
Claessens does not explicitly teach, the time span [ from the summary of actions] having a second granularity that is coarser than the first granularity
Yu however when addressing issues related to pooling CNN neural networks teaches, the time span [ from the summary of actions] having a second granularity that is coarser than the first granularity (Introduction pg 17 “A max-pooling or average-pooling layer is used to generate a lower resolution version of the convolution layer activations. The pooling layer is often important to tolerate translational variances.” The lower resolution is a coarser second granularity)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use pooling layers in the CNN component of Claessens which by definition lower the resolution of the activations. One would have been motivated to make such a combination because both Claessens discuss using pooling layers for the convolutional neural network component of the action prediction system. Claessens notes “Convolutional layers can be alternated optionally with pooling layers that down sample their inputs to introduce an amount of translation invariance into the network” (Claessens ¶0183)

Regarding claim 2
	Claessens/Yu teaches claim 1
Claessens teaches, wherein the second neural network used for the determining of long range interactions is a long short term memory (LSTM) neural network. (¶0095-0096 “A pooling layer could be added to the network. Pooling introduces translation invariance and works well for object detection, but it comes at the cost of losing location information. One option would be to extend pooling over the time dimension….LSTM (Long Short-Term Memory) layers can also be used. The LSTMs would then be responsible for learning time dependencies.” The CNN network is augmented with a LSTM)

Regarding claim 3
Claessens/Yu teaches claim 1
Claessens teaches, wherein the first neural network used for the generating of the summary of actions is a convolutional neural network. (¶0183 “The convolutional neural network 14 process inputs 12 structured as one or more 2-dimensional grids by convolving each input grid 12 with multiple learnt linear filters” the output of the CNN summarizes the input via the learnt filters)

Regarding claim 6
Claessens/Yu teaches claim 1
Claessens teaches, wherein the entity is a device and the action is an operation performed by the device. (¶0213-¶0214 “the present invention relate to a method of controlling demand of a physical product to be distributed to constrained cluster elements grouped in clusters in a demand response system as well as a controller…The physical product can be heat or electrical energy” the energy is controlled and distributed through a device)

Regarding claim 8
Claessens/Yu teaches claim 1
Claessens teaches, wherein the profile is a static profile that is shared across each of the different time frames. ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:… ¶0161 The controllable state information xphys,k relates to a parameter that is to be controlled, e.g. graph 11 in FIG. 1 or 4, and to be kept between upper and lower bounds…where Tk i and Tk i denote the lower and upper bounds for T which can be set by an end user” the bounds define static profile variables that is unChanged over any time frame, shared or concatenated with the timeframes output by the LSTM referred to in claim 1)

Regarding claim 9
Claessens/Yu teaches claim 1
Claessens teaches, wherein the profile is a dynamic profile that is shared with a corresponding time of the different time frames. ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:… ¶0159 The time-dependent information component Xt contains information 11 related to time, i.e. in time steps of, for example, nanoseconds, microseconds, milliseconds, seconds, minutes, days, months” the time dependent information are dynamic profiles variables over different time frames, shared or concatenated with the timeframes output by the LSTM referred to in claim 1)

Regarding claim 11
Claessens teaches, In a digital medium action prediction environment, a machine-learning architecture system for predicting intended actions comprising: (¶001 “The present invention relates to methods, controllers and systems for the control of distribution systems like energy distribution systems…Controlling the demand flexibility of energy constrained flexibility (ECF) sources such as an Electric Vehicle, a Heat Pump, or an HVAC system is known based on model predictive control…When applied in a demand response setting, a desired outcome of such a control technique is a proposed power and/or energy to be consumed by any number of devices” predicting a future demand is predicting an action, or the energy consumption of a device.) a first path of the machine learning architecture system including: an input data module to process input data to divide the input data into blocks of usage summary vectors over multiple time spans that have a first granularity; ( ¶0152 “A high dimensional state representation 11 can be used as an input preferably comprising a time-stepped series of aggregated state distributions [6] of either a single (see FIG. 1, reference number 12) or a parallel series (see FIG. 4, reference numbers 12 a-d) of 2D aggregated state distributions 12” as shown in figure 4 multiple parallel time series of blocks each of the multiple time series spans a length of time, this time series is a vector describing the usage of an entity, and provided as input to a CNN layer. Each series is a first block having multiple time spans of a given first granularity) a first path of the machine learning architecture system including: a first neural network implemented by at least one computing device to generate a summary of actions over a time span from the input data by aggregating the blocks of usage summary vectors;  ( ¶0152 “A convolutional neural network architecture 2” a CNN aggregates the vectors…0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  
    PNG
    media_image1.png
    27
    126
    media_image1.png
    Greyscale
”) a first path of the machine learning architecture system including: a second neural network implemented by the at least one computing device to determine long range interactions across different time frames from the summary of actions; ( ¶0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  …Multiple layers can be stacked to obtain a deep architecture. Convolutional layers can be alternated optionally with pooling layers that down sample their inputs to introduce an amount of translation invariance into the network….These features extracted in the convolutional neural network 14 are then used as input by higher network layers in a fully connected neural network 15” ¶0095-0096 “A pooling layer could be added to the network. Pooling introduces translation invariance and works well for object detection, but it comes at the cost of losing location information. One option would be to extend pooling over the time dimension….LSTM (Long Short-Term Memory) layers can also be used. The LSTMs would then be responsible for learning time dependencies.”  The convolutional layer can be augmented with a pooling layer, the pooling layer can alternatively be a LSTM. And LSTM which processes the output h reveals long range interactions across different timeframes.) a second path of the machine learning architecture system operable separately from the first path the second path including: (examiner notes that the second path as shown in figure 4 is separate from the convolutional network, the profile features are processed through a distinct neural network 17. ) a profile feature module implemented by the at least one computing device to obtain a profile describing characteristics of an entity associated with the actions; (¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:…¶0164 “A control action for each TCL is denoted in this embodiment as a binary value indicating if the TCL is in an OFF/ON state:” ¶0166 “As this state vector only comprises observable state information, e.g. the operational temperature”) the characteristics including the characteristics including One or more static profile features having a first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:… ¶0161 The controllable state information xphys,k relates to a parameter that is to be controlled, e.g. graph 11 in FIG. 1 or 4, and to be kept between upper and lower bounds…where Tk i and Tk i denote the lower and upper bounds for T which can be set by an end user” the bounds define static profile variables that is unChanged over any time frame. Further the controllable state information is information which is held static and not time varying. Because the state information is unchanging its variance attribute is that is time invariant as described in the specification ¶020.) One or more dynamic profile features have a second time variance attribute different than the first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, … ¶0159 The time-dependent information component Xt contains information 11 related to time, i.e. in time steps of, for example, nanoseconds, microseconds, milliseconds, seconds, minutes, days, months” the time dependent information are dynamic profiles variables having time varying attributes which is different than the first time variance attribute.) generate an input for a third neural network of the first path by concatenating the long range interactions across the different time frames determined by the second neural network of the first path of the machine-learning . architecture and the profile obtained by the profile feature module of the second path of the machine-learning architecture; and the third neural network implemented by the at least one computing device to generate a prediction of an action by receiving the input to the third neural network and outputting the prediction of the action as output from the third neural network. (¶0185 “This hidden representation is then combined with the output of the convolutional neural network 14 and the outputs of both networks 14, 17 are merged into fully connected layers 15. A final linear output layer 19 maps the combined hidden features to the predicted Q-value 18 of the input state-action pair.” ¶0184 “step 3 includes real time control whereby the control action resulting from the policy h described above, is to be converted into a product to be dispatched such as energy to be dispatched to the different devices” The outputs of both paths are fed into the final layer 19, the profile path is processed through a network 17, to be then input to the third Neural network. As mentioned previously the output of the CNN is augmented with an LSTM layer and embodies spatial temporal features, the output from this path is also fed to the third neural network. Finally the third neural network outputs the predicted q value for a given predicted control action.)
Claessens does not explicitly teach, the time span [ from the summary of actions] having a second granularity that is coarser than the first granularity
Yu however when addressing issues related to pooling CNN neural networks teaches, the time span [ from the summary of actions] having a second granularity that is coarser than the first granularity (Introduction pg 17 “A max-pooling or average-pooling layer is used to generate a lower resolution version of the convolution layer activations. The pooling layer is often important to tolerate translational variances.” The lower resolution is a coarser second granularity)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use pooling layers in the CNN component of Claessens which by definition lower the resolution of the activations. One would have been motivated to make such a combination because both Claessens discuss using pooling layers for the convolutional neural network component of the action prediction system. Claessens notes “Convolutional layers can be alternated optionally with pooling layers that down sample their inputs to introduce an amount of translation invariance into the network” (Claessens ¶0183)

Regarding claim 12
	Claessens/Yu teaches claim 11
Claessens teaches, wherein the first and second neural networks form a first path in the machine-learning architecture system and the profile feature module forms a second path in the machine-learning architecture system, the first and second paths connected to the third neural network. (¶0185 “This hidden representation is then combined with the output of the convolutional neural network 14 and the outputs of both networks 14, 17 are merged into fully connected layers 15. A final linear output layer 19 maps the combined hidden features to the predicted Q-value 18 of the input state-action pair.” The outputs of both paths are fed into the final layer 19, the profile path is processed through a network 17, to be then input to the third Neural network. As mentioned previously the output of the CNN is augmented with an LSTM layer and embodies spatial temporal features, the output from this path is also fed to the third neural network. Finally the third neural network outputs the predicted q value for a given predicted control action.)

Regarding claim 13
Claim 13 is rejected for the reasons set forth in claim 11 and claim 3
Regarding claim 14
Claim 14 is rejected for the reasons set forth in claim 11 and claim 2
Regarding claim 17
Claim 17 is rejected for the reasons set forth in claim 11 and claim 6

Regarding claim 20
Claessens teaches, In a digital medium action prediction environment, a machine-learning architecture system for predicting intended actions comprising: (¶001 “The present invention relates to methods, controllers and systems for the control of distribution systems like energy distribution systems…Controlling the demand flexibility of energy constrained flexibility (ECF) sources such as an Electric Vehicle, a Heat Pump, or an HVAC system is known based on model predictive control…When applied in a demand response setting, a desired outcome of such a control technique is a proposed power and/or energy to be consumed by any number of devices” predicting a future demand is predicting an action, or the energy consumption of a device. Each series is a first block having multiple time spans of a given first granularity) means for processing input data to divide the input data into blocks of usage summary vectors over multiple time spans; that have a first granularity( ¶0152 “A high dimensional state representation 11 can be used as an input preferably comprising a time-stepped series of aggregated state distributions [6] of either a single (see FIG. 1, reference number 12) or a parallel series (see FIG. 4, reference numbers 12 a-d) of 2D aggregated state distributions 12” as shown in figure 4 multiple parallel time series of blocks each of the multiple time series spans a length of time, this time series is a vector describing the usage of an entity, and provided as input to a CNN layer.) means for generating a summary of actions over a time span from the input data by aggregating the blocks of usage summary vectors; ( ¶0152 “A convolutional neural network architecture 2” a CNN aggregates the vectors…0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  
    PNG
    media_image1.png
    27
    126
    media_image1.png
    Greyscale
”) means for determining long range interactions across different time frames from the summary of actions ( ¶0183 “A convolutional layer consists of multiple filters Wk, each giving rise to an output feature map. The feature map hk corresponding to the kth filter weight matrix Wk can be obtained by:  …Multiple layers can be stacked to obtain a deep architecture. Convolutional layers can be alternated optionally with pooling layers that down sample their inputs to introduce an amount of translation invariance into the network….These features extracted in the convolutional neural network 14 are then used as input by higher network layers in a fully connected neural network 15” ¶0095-0096 “A pooling layer could be added to the network. Pooling introduces translation invariance and works well for object detection, but it comes at the cost of losing location information. One option would be to extend pooling over the time dimension….LSTM (Long Short-Term Memory) layers can also be used. The LSTMs would then be responsible for learning time dependencies.”  The convolutional layer can be augmented with a pooling layer, the pooling layer can alternatively be a LSTM. And LSTM which processes the output h reveals long range interactions across different timeframes.) means for obtaining a profile describing characteristics of an entity associated with the actions;  the characteristics including One or more static profile features having a first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:… ¶0161 The controllable state information xphys,k relates to a parameter that is to be controlled, e.g. graph 11 in FIG. 1 or 4, and to be kept between upper and lower bounds…where Tk i and Tk i denote the lower and upper bounds for T which can be set by an end user” the bounds define static profile variables that is unChanged over any time frame. Further the controllable state information is information which is held static and not time varying. Because the state information is unchanging its variance attribute is that is time invariant as described in the specification ¶020.) One or more dynamic profile features have a second time variance attribute different than the first time variance attribute ( ¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, … ¶0159 The time-dependent information component Xt contains information 11 related to time, i.e. in time steps of, for example, nanoseconds, microseconds, milliseconds, seconds, minutes, days, months” the time dependent information are dynamic profiles variables having time varying attributes which is different than the first time variance attribute.) (¶0158 “The state space X comprises, for example, a plurality of data sets [6]: e.g. time-dependent state information Xt, controllable state information Xphys, and exogenous (uncontrollable) state information Xex:…¶0164 “A control action for each TCL is denoted in this embodiment as a binary value indicating if the TCL is in an OFF/ON state:” ¶0166 “As this state vector only comprises observable state information, e.g. the operational temperature”) means for concatenating the determined long range interactions across the different time frames from the summary of actions and the obtained profile describing the characteristics of the entity associated with the actions; and means for generating a prediction of an action based on the concatenated determined long range interactions across the different time frames from the summary of actions and the obtained profile describing the characteristics of the entity associated with the actions. (¶0185 “This hidden representation is then combined with the output of the convolutional neural network 14 and the outputs of both networks 14, 17 are merged into fully connected layers 15. A final linear output layer 19 maps the combined hidden features to the predicted Q-value 18 of the input state-action pair.” ¶0184 “step 3 includes real time control whereby the control action resulting from the policy h described above, is to be converted into a product to be dispatched such as energy to be dispatched to the different devices” The outputs of both paths are fed into the final layer 19, the profile path is processed through a network 17, to be then input to the third Neural network. As mentioned previously the output of the CNN is augmented with an LSTM layer and embodies spatial temporal features, the output from this path is also fed to the third neural network. Finally the third neural network outputs the predicted q value for a given predicted control action.)


Claim(s) 4-5 and 15-16 are rejected under 35 U.S.C. § 103 as being unpatentable over Claessens/Yu in view of Adavanne et al. “a report on sound event detection with different binaural features” hereinafter Adavanne

Regarding claim 4
Claessens/Yu teaches claim 1
Claessens/Yu does not explicitly teach, wherein the third neural network used for the generating of the prediction is a time-distributed dense neural network.
Adavanne however when addressing the use of a time distributed dense layers which takes RNN outputs as inputs to output a prediction teaches, wherein the third neural network used for the generating of the prediction is a time-distributed dense neural network. (pg 2 Section 2.2 “The CNN layer activation is further fed to layers of bi-directional gated recurrent units (GRU), to learn long term temporal activity patterns. This is followed by layers of time distributed fully-connected (dense) layers… The prediction layer has sigmoid activation in order to be able to produce multi-label output” a final prediction is output by the pair of time distributed dense layers.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to adapt the final neural network of Claessens/Yu to use a time distributed dense layer to processes the RNN outputs as demonstrated in Adavanne. One would have been motivated to make such a combination because both Adavanne and Claessens/Yu discuss using a final neural network to produce a prediction based on provided RNN features. A time distributed dense layer is used with a RNN with memory cell outputs because different cells extract different levels of dependency information, and a time-distributed dense layer is designed to weight the dependency relationships extracted from different cells.

Regarding claim 5
Claessens/Yu teaches claim 1
Claessens teaches, wherein the first neural network includes first and second convolutional neural networks (¶0187 “This input is processed using two 2D convolutional layers of a convolutional neural network 14. The first layer of the convolutional neural network 14 consists of four 7×7 filters, while the second layer uses eight 5×5 filters.”) the second neural network includes first and second long short term memory (LSTM) neural networks, ( ¶0095-0096  LSTM (Long Short-Term Memory) layers can also be used. The LSTMs would then be responsible for learning time dependencies.” The CNN network is augmented with a LSTM, or multiple LSTM layers of sub networks)
Claessens/Yu does not explicitly teach, and the third neural network includes first and second time-distributed fully connected dense neural networks
Adavanne however when addressing the use of a time distributed dense layers which takes RNN outputs as inputs to output a prediction teaches, and the third neural network includes first and second time-distributed fully connected dense neural networks (pg 2 Section 2.2 “The CNN layer activation is further fed to layers of bi-directional gated recurrent units (GRU), to learn long term temporal activity patterns. This is followed by layers of time distributed fully-connected (dense) layers… The prediction layer has sigmoid activation in order to be able to produce multi-label output” a final prediction is output by the pair of time distributed dense layers.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to adapt the final neural network of Claessens/Yu to use a time distributed dense layer to processes the RNN outputs as demonstrated in Adavanne. One would have been motivated to make such a combination because both Adavanne and Claessens/Yu discuss using a final neural network to produce a prediction based on provided RNN features. A time distributed dense layer is used with a RNN with memory cell outputs because different cells extract different levels of dependency information, and a time-distributed dense layer is designed to weight the dependency relationships extracted from different cells.
Regarding claim 15
Claim 15 is rejected for the reasons set forth in claim 11 and claim 4
Regarding claim 16
Claim 16 is rejected for the reasons set forth in claim 11 and claim 5

Claim(s) 7  are rejected under 35 U.S.C. § 103 as being unpatentable over Claessens/Yu in view of Chang et al. US Document ID US-20150046104-A1 hereinafter Chang.
Regarding claim 7
Claessens/Yu teaches claim 1
Claessens does not explicitly teach, wherein the entity is a user and the actions are performed by the user
Chang however when addressing issues related to a hybrid neural network prediction system that uses user profile information as input teaches, wherein the entity is a user and the actions are performed by the user (¶0032 “ESB analytic tool 260 may collect user statistics 240, for example, the user's keyboard 246 and mouse 244 movement frequency and/or proximity information… to advise the user with different power savings options, for example, using power control features on a computer during off hours, or advise facility managers to dim lights or adjust HVAC settings… may interact with a building management system via building management system interface(s) 270 to provide feedback to (and possibly control) all or part of a building management system.” User activity can be used to predict user actions and advise actions for the user such as advising managers to dim lights or adjust HVAC.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the product distribution and control system of Claessens/Yu to operate on user information such as user activity to advise users to adjust devices such as light or HVAC. 
One would have been motivated to make such a combination because both Chang and Claessens/Yu discuss a system which monitors and analyses usage information of devices. Chang elaborates stating that “The architecture of FIG. 1 allows for a two-way communication between an individual user of a platform and a building infrastructure including a building management system. This may provide for better feedback to a user as well as a better aggregate view of energy consumption.” (¶012 Chang)



Claim(s) 21-23 are rejected under 35 U.S.C. § 103 as being unpatentable over Claessens/Yu in view of On et al US Document ID US20120130805A1 hereinafter On, further in view of Keramati et al “Improved churn prediction in telecommunication industry using data mining techniques” hereinafter Keramati. 

Regarding claim 21
Claessens/Yu teaches claim 1
Claessens/Yu does not explicitly teach, wherein the one or more static profile features comprises a user geographical location, 
On when addressing using a machine learning system capable of using user geographical location as input features teaches, wherein the one or more static profile features comprises a user geographical location, (¶0037 “Various input features used for machine learning include characteristics of the user, characteristics of the content, and temporal information. The characteristics of the user used as input variables for learning comprise the age, gender, geographical location, ethnic background, education level, and the user's past choices including preferences specified by the user that determine the content” one of the input features to the model can by variables characteristic of a user such a geographical location in order to generate predictions.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the product distribution and control system of Claessens/Yu to operate on user information such as the geographical location of the user. 
One would have been motivated to make such a combination because both On and Claessens/Yu discuss a system uses machine learning to process input features to make a prediction. On describes that various machine learning techniques may employ a variety of features for the prediction of content (On abstract and ¶038)
Claessens/Yu/On does not explicitly teach, and the one or more dynamic profile features comprise a software subscription age. 
Keramati however when addressing using software subscription age as an input feature teaches, and the one or more dynamic profile features comprise a software subscription age. ( Section 5.1 “We used the dataset which was randomly collected from an operator call-center's database over a 12-month period. The dataset contains 3150 customer data such as number of Call Failure (CF), number of Complains (Co), Subscription Length (SL)” Section 5 “A classifier which has been trained by train set will be used so as to predict test set records.” Section 7 “In this paper we experienced four prominent classification techniques using an Iranian telecommunication company dataset. Artificial Neural Network (ANN) significantly outperformed the other three”  The art describes a machine learning classifier which uses software subscription age as an input feature to a neural network. The context of the art is a subscription to a wireless service which is a software subscription age.)
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the product distribution and control system of Claessens/Yu/Keramati to operate on user information such as the subscription length of the user. 
One would have been motivated to make such a combination because both Keramati and Claessens/Yu/Keramati discuss a system uses machine learning to process input features to make a prediction. Keramati notes “a reliable …predictor will be regarded priceless” and that such a predictor uses features such as subscription length as pointed out above. (Keramati abstract)
Regarding claim 22
Claessens/Yu teaches claim 11
Claim 22 is rejected for the reasons set forth in claim 21 in connection with claim 11
Regarding claim 23
Claim 23 is rejected for the reasons set forth in claim 21 in connection with claim 20

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/J.R.G./
Examiner, Art Unit 2122   

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122