DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s submission filed 2022-10-28 has been entered.  The status of claims is as follows:
Claims 1-17 and 19-30 are pending in the application.
Claim 18 is cancelled.
Claims 4, 10, and 24-25 are amended.

Response to Arguments
Applicant's arguments filed 2022-10-28 in response to rejections under 35 USC 103 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. 
The table below indicates where the examiner has interpreted the structural means of each limitation to be disclosed.
19
means for deriving a simplified version of the directed graph
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 201
[0019] “The simplified version of the directed graph may be a down-sampled version of the directed graph. The down-sampling can involve reducing the resolution of the individual elements associated with the edges and vertices of the directed graph. For example, with specific reference to an ANN with convolutional and fully connected layers, the weight and filter values could be rounded off to reduce the number of bits required to represent each value. The simplification can be conducted at the graph, sector, layer, or element level.” Fig. 3, [0031], “Fig. 3 provides an illustration of one approach for executing step 201 from Fig. 2. Two sets of axes 300 and 310 illustrate one approach for deriving simplified version of direct graph 212 from directed graph 211. The x-axis of both sets of axes is "i" which is a counter variable for representing the elements of a tensor used in the execution of a directed graph. In this example, the tensor is a set of weights in a layer of an ANN represented by the directed graph. In a modern ANN, the number of weights can be quite large, for example, the tensor may include a million elements. The y-axis of graph 300 illustrates the value of the weight associated with counter "i". In this example, simplified version of directed graph 212 is obtained by down-sampling weight tensor 301 using polynomial interpolation. In this approach, polynomial 311 is derived to produce a function F(i) that will give an approximation of the value of weight wi. The polynomial can be represented by a set of coefficients equal to one plus the order of the polynomial. A computation utilizing weight tensor 301 can thereby be greatly simplified by transforming the computation into the polynomial space, and operating on the inputs to the weight layer using the much smaller coefficient tensor 312. Aside from the overhead associated with deriving the polynomial and transforming to and from the coefficient space, the simplified version of the directed graph will be less computationally intensive due to the reduced number of multiplications that need to take to execute the layer associated with weight tensor 301 in the directed graph and coefficient tensor 312 in the simplified version of the directed graph.”
19
means for applying a pilot input tensor to the simplified version of the directed graph
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 202
[0018] “The application of an input to the directed graph can be conceptualized as the provisioning of values to the origin vertices of the graph. For example, with reference to Fig. 1, applying input tensor X to directed graph 100 involves obtaining the values of the elements of tensor X from memory and making them available to the hardware that will conduct the calculations associated with the first set of edges of directed graph 100.” [0032], “Once the simplified version of the directed graph is obtained, a pilot tensor is applied to the simplified version as described above with reference to step 202. The pilot tensor and simplified version of the directed graph are used to obtain relevant information regarding how the actual directed graph will respond when a live input tensor is applied to the directed graph. As such, the pilot input tensor can in some cases be identical to the live input tensor. However, the pilot input tensor can also be modified if needed to operate with the simplified version of the directed graph, or to further simplify execution of the simplified version of the directed graph. For example, the pilot input tensor could have a lower rank or dimensionality than the live input tensor if the simplified version of the directed graph was not compatible with the rank or dimensionality of the live input tensor. The pilot input tensor could also be a down sampled or otherwise simplified version of the live input tensor. For example, the pilot input tensor could be a version of the live input tensor in which the data structures used to store the values of the tensor have been replaced with more simplified structures.”
19
means for obtaining a collection of execution data during the application of the pilot input tensor to the simplified version of the directed graph
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 203
[0022] “Data flow diagram 210 represents the pilot input tensor X being applied to the simplified version of the directed graph 212 to produce execution data 213. The execution data 213 is represented as a markup of the simplified version of the directed graph wherein highlighted portions are identified as having a near negligible contribution to the output tensor. However, the execution data can take on numerous other forms.” Fig. 4 [0033], “When the pilot input tensor is applied to the simplified version of the directed graph, execution data is obtained that will be later used to condition the execution of the directed graph. The data is generally obtained during execution of the directed graph, but can be separate and distinct from the actual values that are produced to obtain the output of the directed graph. For example, the execution data can be a set of execution data values such as the outputs of each hidden layer in an ANN. However, the execution data values can also be derived from those values via a comparison or other computation. The execution data values can represent, or can be used to derive, an approximation of the relative importance of the computation from which they were generated on the overall execution of the directed graph. For example, the execution data values could each uniquely correspond with a set of vertices in the directed graph, each vertex in the set of vertices could product a contribution to the inference tensor produced by the directed graph, and each execution data value cold be proportional in magnitude to the contribution to the inference tensor of each vertex. The execution data values can correspond to any aspect of the directed graph and can represent the importance of that aspect of the directed graph in any number of ways. In specific approaches, the relative importance will be represented by set levels such as high, medium, or low. However, the relative importance could be represented by a numerical value that is proportional to an impact on the inference tensor of the corresponding aspect of the directed graph. The proportionality may be linear or logarithmic.”
[0038] “Fig. 4 provides a conceptual data flow diagram for how the execution data and markup can be generated during the execution of the directed graph. As illustrated, different edges of the directed graph will be associated with different calculations 405 and 406. The two illustrated calculations are two matrix multiplications that could represent the multiplication of a set of weights with an input from a prior layer for purposes of generating a data element for the next layer in an artificially neural network. In the basic example illustrated in Fig. 4, the output of these calculations are compared to a threshold value Z. If the threshold is exceeded, the calculation is considered of high priority. If the threshold is not exceeded, the calculation is considered of low priority. In this example, the execution data is the determination made by 
 this calculation. The execution data can then be used to contribute to a markup of the directed graph as illustrated by the different shading levels in markup 404.”
19
means for applying a live input tensor to the directed graph
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 205
[0018] “The application of an input to the directed graph can be conceptualized as the provisioning of values to the origin vertices of the graph. For example, with reference to Fig. 1, applying input tensor X to directed graph 100 involves obtaining the values of the elements of tensor X from memory and making them available to the hardware that will conduct the calculations associated with the first set of edges of directed graph 100.”
[0032] “Once the simplified version of the directed graph is obtained, a pilot tensor is applied to the simplified version as described above with reference to step 202. The pilot tensor and simplified version of the directed graph are used to obtain relevant information regarding how the actual directed graph will respond when a live input tensor is applied to the directed graph. As such, the pilot input tensor can in some cases be identical to the live input tensor. However, the pilot input tensor can also be modified if needed to operate with the simplified version of the directed graph, or to further simplify execution of the simplified version of the directed graph. For example, the pilot input tensor could have a lower rank or dimensionality than the live input tensor if the simplified version of the directed graph was not compatible with the rank or dimensionality of the live input tensor. The pilot input tensor could also be a down sampled or otherwise simplified version of the live input tensor. For example, the pilot input tensor could be a version of the live input tensor in which the data structures used to store the values of the tensor have been replaced with more simplified structures.”
19
means for conditioning the execution of the directed graph, during the application of the live input tensor to the directed graph, using the collection of execution data
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 204-205
[0035], “The execution data can be utilized to produce a markup of the simplified version of the directed graph which tags the directed graph with different levels of priority such as high, medium, or low. These priority values could then be stored in association with different portions of the directed graph. The different levels of priority can describe how much of a contribution to the output tensor the various portions of the directed graph contributed. The markup can have fixed gradations or can be a heat map with smooth transitions across the graph to indicate the various levels of priority. The priority values for each edge or vertex can be calculated in real time as the directed graph is executing calculations associated with that edge or vertex. For example, the magnitude of a specific computation can be used as a proxy for the priority of that computation, and the execution data can be saved as soon as the computation has been carried out. However, the values can also be updated continuously as the graph continues to carry out the overall computation. Such approaches are beneficial where downstream calculations effectively negate the perceived impact of upstream calculations. As such, the magnitude of downstream calculations can be fed back to impact the stored execution data from prior computations along the same path through the directed graph. The effect of this feedback can be tailored based on how many layers in the directed graph have passed between the value that is being updated and the newly obtained value. “ [0036], “The execution data can also be used to generate specific instructions for a later execution of the directed graph. For example, in the same way that the execution data can be used to generate a tag to indicate that a specific edge of the directed graph is of "low" priority, the execution data can also be used to generate an instruction to reduce the fidelity of the calculations associated with that edge of the directed graph, or to suppress the calculations associated with that edge of the directed graph. Specific approaches for conditioning the execution of the directed graph are discussed in more detail below. Many of these approaches can be triggered by reading the priority information from a tag, and triggering some form of conditional computation based off that tag. However, approaches in which the execution data is the instruction itself short circuits this intermediate lookup step by directly generating the instruction for how a portion of the directed graph should be executed at a later time.” [0039], “The execution data can be used to condition the execution of the directed graph in numerous ways. In general, the approaches used to simplify the directed graph for purposes of generating the simplified version of the directed graph can also be applied to condition the execution of the directed graph. However, as the conditional execution is being guided by information that has been obtained about the performance of the graph, the degree by which the computations are simplified can be much greater in the case of the conditioned execution than in the case of generating the simplified version. As stated previously, the steps associated with conditional execution in Fig. 2 are drawn along separate paths because in different approaches they will exhibit various temporal relationships to each other. For example, the directed graph could be primed for conditional execution prior to the conditional execution of the directed graph, using the stored execution data. In particular, in the approach in which the execution data is stored in the header of packets representing the directed graph, the directed graph would thereby be effectively primed for conditional execution because the priority data would be available for utilization to condition execution in real time as the payload of the packet was pulled for computation during the execution of the directed graph. The priming could include identifying the associated portion of directed graph data, packaging the execution and directed graph data into a data package, and storing the data package at a set location in memory. In another example, the execution of the directed graph will reference a separate data structure as computation is being carried out to determine if and how the associated computation should be conditioned. The separate data structure could be a markup with priorities stored in combination with identifiers of specific locations in the directed graph and the execution of the directed graph could involve obtaining the priorities from the separate data structure using the identifiers as the associated calculation was being carried out.” Fig. 5
[0040] “The execution of the directed graph can be conditioned in numerous ways. Generally, the degree to which the computation is conditioned can be set to vary across the directed graph and can include various gradations that align with the relative priority of that portion of the graph. For example, regions of relatively high priority could be computed just as they would be in the unconditionally executed directed graph, while regions of relatively low priority could be excluded from computation entirely. The various approaches for conditional computation discussed below could be mixed and assigned in various ways to the levels of priority. For example, high, medium, and low priorities could be associated with three entirely separate conditional computation schemes. As another example, the conditional computation scheme could be held constant across the directed graph, but the relative accuracy of the scheme could be modified in accordance with the priorities. For example, a degree of rounding or down-sampling could be set proportional to the priority level with a smooth transition from original value execution, to rounded value execution, to execution conducted independently of the original values. Such approaches could be efficiently applied if the priority value was a smoothly varying numerical value.” [0041] “The actual conditional execution of the directed graph can be conducted in various ways. The conditioning and the forms of conditional computation being separated concepts. Based on the execution data, the fidelity of various computations in the execution of the directed graph can be selectively decreased to different levels. For example, the conditional computation could involve decreasing the number of bits used to represent the inputs or outputs of a given computation. As another example, the data structure used to represent the inputs or outputs of a given computation could be simplified (e.g., from 8-bit floating point to 4- bit fixed point). As another example, the conditional computation could involve providing a fixed value in place of executing the computation. In one particular example, this value could be stored in a header of a data structure that would have been involved in the computation. As another example, the actual arithmetic portion of the computation could be simplified such that it discarded a certain number of LSBs from the computation. As another example, the computation could be suppressed altogether without even the need for providing a masked value. In even more specific approaches, replacement values for the output of the computation could be stored downstream in association with later stages of the directed graph.”
19
means for obtaining an output tensor from the conditional execution of the directed graph
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” 
“Execution of the directed graph will involve the execution of calculations associated with the edges of the directed graph, and the ultimate generation of output tensor Y. Tensor Y is therefore obtained from the directed graph and can be stored in memory as a distinct unit of data once the directed graph has been executed. Tensor Y can be an inference tensor generated by a machine intelligence system. However, the directed graphs executed by the methods of flow chart 200 can include multiple inputs or multiple outputs and can represent other computational systems besides those associated with machine intelligence.” Fig. 2. 206 



20
means for storing the execution data in memory as stored execution data
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” [0021] “Steps 202 and 203 are illustrated as sequential because the execution data is generally available for storage in memory after the input tensor has been applied and the graph has completed execution.” Fig. 2 202-203.
[0034] “However, the execution data 404 is produced and stored orthogonally to the main data flow of the directed graph. The execution data can be obtained and stored in various ways. The execution data can be obtained during the application of the input tensor to the simplified version of the directed graph by monitoring the values produced internally during the calculations associated with the edges of the directed graph.”
[0035], “The execution data can be utilized to produce a markup of the simplified version of the directed graph which tags the directed graph with different levels of priority such as high, medium, or low. These priority values could then be stored in association with different portions of the directed graph. The different levels of priority can describe how much of a contribution to the output tensor the various portions of the directed graph contributed. The markup can have fixed gradations or can be a heat map with smooth transitions across the graph to indicate the various levels of priority. The priority values for each edge or vertex can be calculated in real time as the directed graph is executing calculations associated with that edge or vertex. For example, the magnitude of a specific computation can be used as a proxy for the priority of that computation, and the execution data can be saved as soon as the computation has been carried out. However, the values can also be updated continuously as the graph continues to carry out the overall computation. Such approaches are beneficial where downstream calculations effectively negate the perceived impact of upstream calculations. As such, the magnitude of downstream calculations can be fed back to impact the stored execution data from prior computations along the same path through the directed graph. The effect of this feedback can be tailored based on how many layers in the directed graph have passed between the value that is being updated and the newly obtained value.” [0037], “The execution data can be stored in association with the portions of the directed graph to which they relate in various ways. For example, a markup could be stored in a distributed set of memory locations, or at a single memory location such that all of the data could be recalled using a single memory address or a contiguous sequence of memory addresses. The data can also be stored as an entirely separate data structure in memory. To use the example of 213, the heat map could be stored separately with priority levels and tags identifying specific portions of the graph. Alternatively, the data or markup can be stored directly within the data structures that represent the directed graph and can be obtained along with the data for the directed graph via a single address call to memory. For example, the execution data could be stored in packet headers where the payload of each packet was the data that represented the directed graph itself. To use the example of a directed graph that implements an ANN, the weights or filters of the ANN could be stored along with a value that represented the impact of that weight or filter on the output tensor in response to the pilot input tensor. In a specific example that is in accordance with this class of approaches, a priority value for a weight tensor and the weight tensor itself could be obtained from a memory location using a single memory address.”
20
means for priming the directed graph for the conditional execution, prior to the conditional execution of the directed graph, using the stored execution data
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” Fig. 2. 204-205
[0039] “For example, the directed graph could be primed for conditional execution prior to the conditional execution of the directed graph, using the stored execution data. In particular, in the approach in which the execution data is stored in the header of packets representing the directed graph, the directed graph would thereby be effectively primed for conditional execution because the priority data would be available for utilization to condition execution in real time as the payload of the packet was pulled for computation during the execution of the directed graph. The priming could include identifying the associated portion of directed graph data, packaging the execution and directed graph data into a data package, and storing the data package at a set location in memory. In another example, the execution of the directed graph will reference a separate data structure as computation is being carried out to determine if and how the associated computation should be conditioned. The separate data structure could be a markup with priorities stored in combination with identifiers of specific locations in the directed graph and the execution of the directed graph could involve obtaining the priorities from the separate data structure using the identifiers as the associated calculation was being carried out.”



21
means for generating a markup of the directed graph using the collection of execution data
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” 
[0035] “The execution data can be utilized to produce a markup of the simplified version of the directed graph which tags the directed graph with different levels of priority such as high, medium, or low. These priority values could then be stored in association with different portions of the directed graph. The different levels of priority can describe how much of a contribution to the output tensor the various portions of the directed graph contributed. The markup can have fixed gradations or can be a heat map with smooth transitions across the graph to indicate the various levels of priority. The priority values for each edge or vertex can be calculated in real time as the directed graph is executing calculations associated with that edge or vertex. For example, the magnitude of a specific computation can be used as a proxy for the priority of that computation, and the execution data can be saved as soon as the computation has been carried out. However, the values can also be updated continuously as the graph continues to carry out the overall computation.”
[0038] “The execution data can then be used to contribute to a markup of the directed graph as illustrated by the different shading levels in markup 404.” Figures 2, 4.



22
means for storing the markup in a distributed set of memory locations
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” [0021] “Steps 202 and 203 are illustrated as sequential because the execution data is generally available for storage in memory after the input tensor has been applied and the graph has completed execution.” Fig. 2 202-203.
[0037] “The execution data can be stored in association with the portions of the directed graph to which they relate in various ways. For example, a markup could be stored in a distributed set of memory locations, or at a single memory location such that all of the data could be recalled using a single memory address or a contiguous sequence of memory addresses. The data can also be stored as an entirely separate data structure in memory. To use the example of 213, the heat map could be stored separately with priority levels and tags identifying specific portions of the graph. Alternatively, the data or markup can be stored directly within the data structures that represent the directed graph and can be obtained along with the data for the directed graph via a single address call to memory. For example, the execution data could be stored in packet headers where the payload of each packet was the data that represented the directed graph itself. To use the example of a directed graph that implements an ANN, the weights or filters of the ANN could be stored along with a value that represented the impact of that weight or filter on the output tensor in response to the pilot input tensor.”
22
means for obtaining the priority value and the weight tensor from a memory location in the distributed set of memory locations using a single address
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” [0021] “Steps 202 and 203 are illustrated as sequential because the execution data is generally available for storage in memory after the input tensor has been applied and the graph has completed execution.” Fig. 2 202-203.
[0037] “The execution data can be stored in association with the portions of the directed graph to which they relate in various ways. For example, a markup could be stored in a distributed set of memory locations, or at a single memory location such that all of the data could be recalled using a single memory address or a contiguous sequence of memory addresses. The data can also be stored as an entirely separate data structure in memory. To use the example of 213, the heat map could be stored separately with priority levels and tags identifying specific portions of the graph. Alternatively, the data or markup can be stored directly within the data structures that represent the directed graph and can be obtained along with the data for the directed graph via a single address call to memory. For example, the execution data could be stored in packet headers where the payload of each packet was the data that represented the directed graph itself. To use the example of a directed graph that implements an ANN, the weights or filters of the ANN could be stored along with a value that represented the impact of that weight or filter on the output tensor in response to the pilot input tensor. In a specific example that is in accordance with this class of approaches, a priority value for a weight tensor and the weight tensor itself could be obtained from a memory location using a single memory address.”



23
means for generating a markup of the directed graph using the collection of execution data
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” 
[0035] “The execution data can be utilized to produce a markup of the simplified version of the directed graph which tags the directed graph with different levels of priority such as high, medium, or low. These priority values could then be stored in association with different portions of the directed graph. The different levels of priority can describe how much of a contribution to the output tensor the various portions of the directed graph contributed. The markup can have fixed gradations or can be a heat map with smooth transitions across the graph to indicate the various levels of priority. The priority values for each edge or vertex can be calculated in real time as the directed graph is executing calculations associated with that edge or vertex. For example, the magnitude of a specific computation can be used as a proxy for the priority of that computation, and the execution data can be saved as soon as the computation has been carried out. However, the values can also be updated continuously as the graph continues to carry out the overall computation.”
[0038] “The execution data can then be used to contribute to a markup of the directed graph as illustrated by the different shading levels in markup 404.” Figures 2, 4.
23
means for storing the markup in a distributed set of memory locations
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” [0021] “Steps 202 and 203 are illustrated as sequential because the execution data is generally available for storage in memory after the input tensor has been applied and the graph has completed execution.” Fig. 2 202-203.
[0037] “The execution data can be stored in association with the portions of the directed graph to which they relate in various ways. For example, a markup could be stored in a distributed set of memory locations, or at a single memory location such that all of the data could be recalled using a single memory address or a contiguous sequence of memory addresses. The data can also be stored as an entirely separate data structure in memory. To use the example of 213, the heat map could be stored separately with priority levels and tags identifying specific portions of the graph. Alternatively, the data or markup can be stored directly within the data structures that represent the directed graph and can be obtained along with the data for the directed graph via a single address call to memory. For example, the execution data could be stored in packet headers where the payload of each packet was the data that represented the directed graph itself. To use the example of a directed graph that implements an ANN, the weights or filters of the ANN could be stored along with a value that represented the impact of that weight or filter on the output tensor in response to the pilot input tensor.”
23
means for conditioning an update of the direct graph using the markup
[0018] “The steps of flow chart 200 can be explained with reference to conceptual data flow diagram 210. Each of the steps can be conducted by a processor operating in combination with a memory for storing the related data structures and the instructions necessary to carry out the steps.” 
[0035] “The execution data can be utilized to produce a markup of the simplified version of the directed graph which tags the directed graph with different levels of priority such as high, medium, or low. These priority values could then be stored in association with different portions of the directed graph. The different levels of priority can describe how much of a contribution to the output tensor the various portions of the directed graph contributed. The markup can have fixed gradations or can be a heat map with smooth transitions across the graph to indicate the various levels of priority. The priority values for each edge or vertex can be calculated in real time as the directed graph is executing calculations associated with that edge or vertex. For example, the magnitude of a specific computation can be used as a proxy for the priority of that computation, and the execution data can be saved as soon as the computation has been carried out. However, the values can also be updated continuously as the graph continues to carry out the overall computation. Such approaches are beneficial where downstream calculations effectively negate the perceived impact of upstream calculations. As such, the magnitude of downstream calculations can be fed back to impact the stored execution data from prior computations along the same path through the directed graph. The effect of this feedback can be tailored based on how many layers in the directed graph have passed between the value that is being updated and the newly obtained value.”
[0038] “The execution data can then be used to contribute to a markup of the directed graph as illustrated by the different shading levels in markup 404.” Figures 2, 4.
[0045], “In the specific application of an ANN the conditional computation can be used in both the generation of an inference tensor from the ANN and in training of the ANN. In approaches using back propagation, the updating of the weights during back propagation could be varied based on a known priority of that section of the network. For example, the degree to which weights are updated or modified could be limited by the priority of that portion of the ANN. Weights in highly sensitive and important portions of the neural network could be updated with high precision while weights in low sensitivity portions of the neural network could be kept constant during back propagation.”


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8, and 10-30 are rejected under 35 U.S.C. 103 as being unpatentable over Tome et al. (“Reduced Memory Region Based Deep Convolutional Neural Network Detection”; hereinafter “Tome”) in view of Wang (US 2019/0279089 A1) and Liu et al. (“Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-Offs by Selective Execution”; hereinafter “Liu”)
As per Claim 1, Tome teaches a computer-implemented method for executing a directed graph, in which each step is conducted by a processor, comprising: 
deriving a simplified version of the directed graph, wherein the directed graph is an original directed graph (Tome, Page 3 Section  IV First Bullet, discloses:  “Scalar quantization: each CNN layer is compressed individually. All the weight values in the layer parameters are clustered using the k-means algorithm, where the number of centroids is chosen as a function of the compression factor.”  Here, Tome discloses deriving a simplified version (“quantization”) of a directed graph (“CNN”, wherein one of ordinary skill in the art will appreciate that a neural network is a directed graph)).
However, Tome does not explicitly teach obtaining a collection of execution data during an execution of the simplified version of the directed graph; applying an input tensor to the directed graph, wherein the directed graph is the original directed graph and not the simplified version of the directed graph; conditioning the execution of the directed graph by selecting, during the application of the input tensor to the directed graph, computations for suppression using the collection of execution data; and obtaining an output tensor from the conditional execution of the directed graph.
Wang teaches obtaining a collection of execution data during an execution of the simplified version of the directed graph (Recall above Tome discloses deriving a simplified version of the directed graph.  Wang, Para [0049-0050], discloses:  “where qil denotes the variance of the activation value vector for the i-th neuron in the network layer to be pruned, and Ql denotes the neuron variance importance vector for the network layer to be pruned. In some embodiments of the present disclosure, when the variance of the activation value vector for a neuron is small, it indicates that the activation value of the neuron does not vary significantly for different input data (e.g., when the activation value of the neuron is always 0, it indicates that the neuron has no impact on the output result from the network). That is, a neuron having a smaller variance of its activation value vector has a smaller impact on the output result from the neural network, and on the other hand, a neuron having a larger variance of its activation value vector has a larger impact on the output result from the neural network. Hence, the variance of the activation value vector for a neuron may reflect the importance of the neuron to the neural network. If the activation value of a neuron is always maintained at a non-zero value, the neuron may be fused into another neuron.”  Here, Wang discloses calculating the “variance” of the activation value of a neuron, which requires several inputs of data to be propagated through the neural network, in order to determine its “importance”).
Tome and Wang are analogous art because they are both in the field of endeavor of compressing neural networks.	It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Tome and Wang.  The combination would result in one taking the “quantization” of Tome, and using the “quantized” version of the neural network to run several instances of input data through the quantized version of the neural network in order to calculate the variances of the neurons of the neural network so that one may determine which neurons may be pruned.  Tome discloses the use of the quantization as a precursor to pruning in Tome Page 3 Section IV:  “In order to reduce the parameter memory, two strategies inspired by [7] have been chosen to compress the network weights: scalar quantization and weight pruning…Finally, the two approaches are combined, either by quantizing and then pruning the resulting weights, or by pruning and then quantizing the resulting distribution.”  Wang, as discussed above, determines the pruning by calculating the “variance” of the neuron activation values, which one of ordinary skill in the art will appreciate that requires a statistically significant number of inputs into the neural network to calculate this value, which can then be applied to the pruning of the original graph.  One of ordinary skill in the art would be motivated to use the quantization of Tome in order to reduce the amount of calculation needed to calculate these variances as stated in Tome Page 4 Bottom of Para 2:  “As in the case of fully connected layers, convolutional layers are more robust to scalar quantization, achieving compression factor up to 6x with a small cost in accuracy” and to then prune unnecessary connections based on the variance in order to achieve better compression while maintaining accuracy, as stated in Wang [0075]:  “In the solutions according to the present disclosure, an importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability. Hence, the neurons selected in accordance with the volume maximization neuron selection policy have greater contributions to the output result from the neural network and higher expression capabilities, while the pruned neurons are neurons having smaller contributions to the output result from the neural network and lower expression capabilities. Accordingly, when compared with the original neural network, the pruned neural network may achieve good compression and acceleration effects while having little accuracy loss.”)
However, the combination of Tome and Wang does not explicitly teach applying an input tensor to the directed graph, wherein the directed graph is the original directed graph and not the simplified version of the directed graph; 
conditioning the execution of the directed graph by selecting, during the application of the input tensor to the directed graph, computations for suppression using the collection of execution data; 
and obtaining an output tensor from the conditional execution of the directed graph.
Liu teaches applying an input tensor to the directed graph, wherein the directed graph is the original directed graph and not the simplified version of the directed graph (Liu, Page 4 Left Column Last Paragraph, discloses:  “It is worth noting that unlike many prior works that use deep reinforcement learning, a D2NN is not recurrent. In particular, for each input to the network (e.g. an image), each control node only executes once. In addition, the decisions of a control node completely depend on the current input.”  Here, Liu discloses applying an input tensor (an input image, which at the very least is a matrix, which is a 2-D tensor) to the directed graph (neural network)). 
conditioning the execution of the directed graph by selecting, during the application of the input tensor to the directed graph, computations for suppression using the collection of execution data; (Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”)
and obtaining an output tensor from the conditional execution of the directed graph. (Liu, Page 3 Left Column Last Paragraph, discloses:  “Fig. 2 illustrates a simple D2NN with all kinds of nodes and edges. The function nodes are drawn as rectangles or diamonds depending on whether they are regular nodes or control nodes. The input and output nodes are drawn as circles with the output nodes shaded.”  Here, Liu discloses obtaining an output tensor from the “output nodes.”)
Liu and the combination of Tome and Wang are analogous art because they are both in the field of endeavor of efficient evaluation of neural networks.
It would have been obvious before the effective filing date of the claimed invention to combine the quantization and neuron importance of Tome and Wang with the selective execution of Liu.  The combination would result in, instead of the pre-inference pruning of Wang, performing an on-the-fly “pruning” during inference execution taught by Liu, while taking into account the variance of each neuron (as taught by Wang) into the logic of the controller module (Liu, Intro Page 1:  “A control module is a sub-network whose output is a decision that controls whether other modules can execute”) of Liu that determines the conditional execution.  Thus, in summary, the quantization of Tome reduces the computational resources needed to calculate the variance, by calculating the variance, as taught by Wang, on the simplified graph, and then Wang’s variance is used on the original graph as part of the control module of Liu to determine the conditional execution of Liu, on the original graph.  Liu also adds the additional feature of the conditional execution being dependent on the input (Liu, Page 1 Intro:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”)  One of ordinary skill in the art would be motivated to add Liu’s conditional execution in order to improve computational efficiency of executing inferences on the neural network (Liu, Page 1 Abstract:  “By pruning unnecessary computation depending on input, D2NNs provide a way to improve computational efficiency.”)

	As per Claim 2, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 1.  Tome teaches a simplified version of the directed graph (Tome, Page 3 Section  IV First Bullet, discloses:  “Scalar quantization: each CNN layer is compressed individually. All the weight values in the layer parameters are clustered using the k-means algorithm, where the number of centroids is chosen as a function of the compression factor.”)
However, Tome does not teach applying a pilot input tensor to the simplified version of the directed graph, to conduct the execution of the simplified version of the directed graph; wherein the input tensor is a live input tensor; the pilot input tensor and the live input tensor are not identical; and the pilot input tensor and the live input tensor are stochastically dependent
Wang teaches applying a pilot input tensor to the simplified version of the directed graph, to conduct the execution of the simplified version of the directed graph (Recall above Tome teaches a simplified version of the directed graph.  Wang, Para [0049-0050], discloses:  “where qil denotes the variance of the activation value vector for the i-th neuron in the network layer to be pruned, and Ql denotes the neuron variance importance vector for the network layer to be pruned. In some embodiments of the present disclosure, when the variance of the activation value vector for a neuron is small, it indicates that the activation value of the neuron does not vary significantly for different input data (e.g., when the activation value of the neuron is always 0, it indicates that the neuron has no impact on the output result from the network). That is, a neuron having a smaller variance of its activation value vector has a smaller impact on the output result from the neural network, and on the other hand, a neuron having a larger variance of its activation value vector has a larger impact on the output result from the neural network. Hence, the variance of the activation value vector for a neuron may reflect the importance of the neuron to the neural network. If the activation value of a neuron is always maintained at a non-zero value, the neuron may be fused into another neuron.”  Here, Wang discloses calculating the “variance” of the activation value of a neuron, which requires several inputs of data to be propagated through the neural network, in order to determine its “importance”.  Thus, this requires a plurality of “pilot” inputs to calculate this value.  Wang suggests using this in computer vision in [0003]:  “Currently, deep neural networks have achieved enormous success in computer vision technology, such as image classification, target detection, image segmentation and the like.”  Thus, the input may be an image, which at the very least is a matrix, which is a 2-D tensor.  Thus, Wang discloses a plurality of “pilot input tensors” in order to calculate the variance, which can be done on Tome’s simplified quantized version of the neural network).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, the combination of Tome and Wang does not teach wherein the input tensor is a live input tensor; the pilot input tensor and the live input tensor are not identical; and the pilot input tensor and the live input tensor are stochastically dependent
Liu teaches wherein the input tensor is a live input tensor (Liu, Page 4 Left Column Last Paragraph, discloses:  “It is worth noting that unlike many prior works that use deep reinforcement learning, a D2NN is not recurrent. In particular, for each input to the network (e.g. an image), each control node only executes once. In addition, the decisions of a control node completely depend on the current input.”  Here, Liu discloses applying an input tensor (an input image, which at the very least is a matrix, which is a 2-D tensor) to the directed graph (neural network), in order to perform conditional execution.  This is after the variances are established by the “pilot” input data, and thus this is the “live input tensor”).
the pilot input tensor and the live input tensor are not identical (As shown above, Wang teaches using pilot input tensors to calculate the variance of the activation values of the neurons to determine neuron importance, and with the neuron importance already established, the live input tensor does not need to be identical to any particular one of the pilot input tensors.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.
Finally, the combination of Wang and Liu suggests and the pilot input tensor and the live input tensor are stochastically dependent (Liu, as shown above, teaches the live input tensor.  Wang, Para [0049], discloses:  “where qil denotes the variance of the activation value vector for the i-th neuron in the network layer to be pruned, and Ql denotes the neuron variance importance vector for the network layer to be pruned.”  One of ordinary skill in the art will appreciate that the “variance” calculated to determine the “importance” of a neuron, is based on probability distribution that results from the collection of pilot input tensors, and that any “live” input tensor must be drawn from the same population, else the calculated variance is no longer applicable to the importance of the neuron for a given space of inputs.  Thus, the “live” input tensor must be stochastically dependent (drawn from the same population) to the “pilot” input tensor for the calculated variances of the neurons to be effective.  For example, calculating the variance based on pilot images from a population of medical images, would not be effective for inputting a live image drawn from a population of photography stock images.)

As per Claim 3, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 1.  Wang teaches further comprising: storing the execution data in memory as stored execution data (Wang, as shown above in Claim 1, discloses execution data (variance values of neuron importance).  Wang, Para [0010], discloses that the variance values must be stored for future use:  “Then, neurons to be retained are selected from the network layer to be pruned based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.”.  Here, Wang states pruning is to be performed “based on the importance values”, which must have been stored in memory in order to do so.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, Wang does not teach and priming the directed graph for the conditional execution, prior to the conditional execution of the directed graph, using the stored execution data.
Liu teaches and priming the directed graph for the conditional execution, prior to the conditional execution of the directed graph, using the stored execution data. (Recall above Wang’s variance data that was stored for neuron importance.  Liu, Intro Page 1:  “A control module is a sub-network whose output is a decision that controls whether other modules can execute.”  Here, Liu discloses using stored data that resides in control modules in order to perform conditional execution.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 4, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 1.  Examiner notes that Claim 4 merely describes a generic neural network.  Liu teaches wherein: the directed graph includes a set of vertices and a set of edges interconnecting the set of vertices (Liu, Page 3 Section 3, discloses:  “A D2NN is a directed acyclic graph consisting of nodes and directed edges.”)  
the directed graph is a neural network (Liu, Page 1 Intro, discloses:  “This paper introduces Dynamic Deep Neural Networks (D2NN)).”
the set of edges of the directed graph are calculations involving a set of weights for the neural network, wherein the set of weights include at least one weight tensor (Liu, Page 4 Section 4, discloses weights (“parameters”):  “In particular, the output of the network cannot be expressed as a differentiable function of all trainable parameters, especially the parameters in the control nodes.”  Liu discloses using vectors, which are 1-D tensors, in Page 3 Section 3:  “An input or output node represents a real-value vector as an input or output of the network. A function node represents a (differentiable) function that maps a vector to another vector. There are two types of edges: data edges and control edges. A data edge represents a vector sent from one node to another, the same as in a conventional feedforward network.”)
at least a subset of the set of vertices are weights for the neural network; (Liu, Page 4 Section 4 Para 3, discloses that the learned parameters are for a “node”:  “We further assume that all parameters except those of this control node have been learned and fixed”.)
the conditional execution of the directed graph produces an inference tensor; (Liu, Page 6 Para 2, discloses:  “We test this hypothesis using a simple binary classification task in which the network classifies an input image as face or non-face.” Here, Liu discloses producing an inference tensor, the inference being a classification of the image, and a binary value is a scalar value, which is a 1-D tensor with 1 element.)
and the inference tensor is a response of the neural network to the input tensor. (Liu, Page 6 Para 2, discloses:  “We test this hypothesis using a simple binary classification task in which the network classifies an input image as face or non-face.” Here, Liu discloses producing an inference tensor, the inference being a classification of the image, and a binary value is a scalar value, which is a 1-D tensor with 1 element.  This is a response to the input tensor (an input image.))
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 5, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Liu teaches wherein: an edge in the set of edges is a calculation using a four dimensional tensor. (Liu, Figure 2, “As shown in FIG. 2, the AE 200 is a feed-forward neural network with one hidden layer 204. The AE 200 has an input layer L.sub.1 202, the hidden layer L.sub.2, and an output layer L.sub.3 206. If the AE 200 is a fully connected network, each node in the input layer 202 can correspond to a respective voxel or pixel of an image patch. Ignoring the bias term (the nodes labeled as +1 in FIG. 2), the input and output layers 202 and 206, respectively have the same number of nodes. The goal of an AE is to minimize the difference between the input and output vectors. If the hidden layer 204 has a size equal to or larger than the input layer 202, an AE may learn an identify transformation. To prevent such a trivial solution, an AE can be set up with a hidden layer 204 with fewer nodes than the input layer 202. [col 4, lines 16-42]. Examiner Note: Figure two demonstrates that Liu’s network can include a four dimensional calculation, which could be represented as a tensor.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 6, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  However, the combination of Tome, Wang, and Liu does not teach 
wherein: the deriving of the simplified version of the directed graph includes down-sampling the directed graph by a sampling factor; the simplified version of the directed graph is thereby a down-sampled version of the directed graph; (Tome, Page 3 Section 4, First Bullet, discloses:  “Scalar quantization: each CNN layer is compressed individually. All the weight values in the layer parameters are clustered using the k-means algorithm, where the number of centroids is chosen as a function of the compression factor… Henceforth, we will consider a single precision floating point representation (B = 32) for the uncompressed weights. The maximum achievable compression rate is thus 32.”  Tome continues in Page 4 Para 1:  “We assess the effect of scalar quantization by measuring the accuracy of the model after compressing the fully connected (fc) layers at a factor of 8, 10.7, 16 and 32 (4, 3, 2 and 1 bit per weight) and keeping the convolutional layers unchanged.”  Here, Tome discloses downsampling the directed graph by a factor, to a simplified version.)
a first complete set of tensors used for executing the simplified version of the directed graph has a rank; and a second complete set of tensors used for executing the directed graph has the rank.  (Tome, Page 1 Abstract, discloses inputting an image, which is  represented by at the very least a matrix, which is a tensor of rank 2 (or in some cases, a matrix with 3 channels, which would be a tensor of rank 3).  The image would be input to either the simplified or original graph.)


As per Claim 8, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Tome teaches wherein: the deriving of the simplified version of the directed graph includes replacing a set of original values of the set of weights with a set of replacement values; and the simplified version of the directed graph has a same number of layers as the directed graph. (Tome, Page 3 Section IV First Bullet, discloses: “Scalar quantization: each CNN layer is compressed individually. All the weight values in the layer parameters are clustered using the k-means algorithm, where the number of centroids is chosen as a function of the compression factor.”  Here, Tome discloses the weights being replaced by being clustered into centroids.  Tome does not disclose removing layers in the quantification step.)

As per Claim 10, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Wang teaches wherein: the collection of execution data includes a set of execution data values; (Wang, Para [0049-0050], discloses:  “where qil denotes the variance of the activation value vector for the i-th neuron in the network layer to be pruned, and Ql denotes the neuron variance importance vector for the network layer to be pruned. In some embodiments of the present disclosure, when the variance of the activation value vector for a neuron is small, it indicates that the activation value of the neuron does not vary significantly for different input data (e.g., when the activation value of the neuron is always 0, it indicates that the neuron has no impact on the output result from the network). That is, a neuron having a smaller variance of its activation value vector has a smaller impact on the output result from the neural network, and on the other hand, a neuron having a larger variance of its activation value vector has a larger impact on the output result from the neural network. Hence, the variance of the activation value vector for a neuron may reflect the importance of the neuron to the neural network. If the activation value of a neuron is always maintained at a non-zero value, the neuron may be fused into another neuron.”  Here, Wang discloses calculating the “variance” of the activation value of each neuron, and thus teaches a set of execution data values.)
the set of execution data values and the set of vertices have uniquely corresponding elements (Wang, shown above, discloses an element of the set of execution data (a variance) that corresponds to each vertex (neuron)).
each uniquely corresponding vertex in the set of vertices produces a contribution to the inference tensor in response to a pilot input tensor; (Wang, Para [0075], discloses:  “In the solutions according to the present disclosure, an importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability.”  Here, Wang discloses that each neuron has some contribution to the output of the neural network (an inference tensor) in response to the pilot input tensor.)
and each execution data value in the set of execution data values is proportional in magnitude to the contribution to the inference tensor of each uniquely corresponding vertex in the set of vertices. (Wang, Para [0075], discloses:  “In the solutions according to the present disclosure, an importance value of a neuron reflects a degree of impact the neuron has on an output result from the neural network, and a diversity of a neuron reflects its expression capability.”  Here, Wang discloses above that the execution data value indicates the “degree of impact” on the output result.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 11, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Wang teaches further comprising: storing the execution data in a distributed set of memory locations; (Wang, as shown above in Claim 1, discloses execution data (variance values of neuron importance).  Wang, Para [0010], discloses that the variance values must be stored for future use:  “Then, neurons to be retained are selected from the network layer to be pruned based on the importance values and diversity values of the neurons in the network layer to be pruned in accordance with a volume maximization neuron selection policy.”.  Here, Wang states pruning is to be performed “based on the importance values”, which must have been stored in memory in order to do so.  Wang, Para [0123], discloses:  “Based on the same concept as the above method, a storage medium (which can be a non-volatile machine readable storage medium) is provided according to some embodiments of the present disclosure. The storage medium stores a computer program for neural network pruning. The computer program includes codes configured to: determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.”  Here, Wang discloses that the program to perform these operations is on a computer readable storage medium, which is a distributed set of memory locations.)
obtaining, from a memory location in the distributed set of memory locations using a single address, both: (i) a subset of execution data from the execution data; and (ii) a weight tensor from the set of weights; (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.” Here, Wang discloses obtaining from a memory location with a single address, a subset (which can be one element) of execution data, the execution data being one (“for each”) activation’s importance value, and also form a single memory location getting the single “connecting weight” associated with the activation value.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, Wang does not teach and wherein the conditioning of the execution of the directed graph is conducted in real time using the execution data and the set of weights.
Liu teaches and wherein the conditioning of the execution of the directed graph is conducted in real time using the execution data and the set of weights. (Recall Wang above discloses execution data and set of weights.  Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”  Here, Liu discloses conditioning execution in real time.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 12, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Wang teaches further comprising: generating a markup of the directed graph using the collection of execution data; (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.”  Here, Wang discloses storing importance values associated with the neurons in the directed graph, and this is used in the future to prune the graph, thus the importance values function as a markup of the directed graph indicating the values for each neuron.)
storing the markup in a distributed set of memory locations; (Wang, Para [0123], discloses:  “Based on the same concept as the above method, a storage medium (which can be a non-volatile machine readable storage medium) is provided according to some embodiments of the present disclosure. The storage medium stores a computer program for neural network pruning. The computer program includes codes configured to: determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.”  Here, Wang discloses that the program to perform these operations is on a computer readable storage medium, which is a distributed set of memory locations.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, Wang does not explicitly teach and conditioning an update of the set of weights using the markup.
Liu teaches and conditioning an update of the set of weights using the markup. (Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”  Here, Liu discloses executing the neural network conditionally based on “a control module whose output is a decision that controls whether other modules can execute” as stated on Page 1.  Liu also discloses training, which updates the weights, in Page 5 Top Right Column:  “Joint Training of All Nodes We have described how to train a single control node assuming the parameters of all other nodes are fixed. We now describe how to extend this strategy to all nodes including additional control nodes as well as regular nodes.”  Furthermore, Wang also discloses using training on the adjusted weights in [0122]:  “Here, the processor 1401 may be further operative to execute the at least one machine executable instruction to: train the neural network having the weights adjusted, by using predetermined training data.”)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 13, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 4.  Wang teaches further comprising: generating a markup of the directed graph using the collection of execution data; (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.”  Here, Wang discloses storing importance values associated with the neurons in the directed graph, and this is used in the future to prune the graph, thus the importance values function as a markup of the directed graph indicating the values for each neuron.)
wherein the markup identifies a priority value for a weight tensor; (Wang, Para [0010] above, discloses that the markup identifies “importance” of an activation value associated with a weight, and thus identifies a priority value of the associated weight tensor.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, Wang does not explicitly teach and wherein conditioning of the execution of the directed graph uses the markup.
Liu teaches and wherein conditioning of the execution of the directed graph uses the markup (Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”  Here, Liu discloses executing the neural network conditionally based on “a control module whose output is a decision that controls whether other modules can execute” as stated on Page 1.  Recall above that Wang disclosed markup which comprises importance scores for each node.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 14, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 13.  Wang teaches storing the markup in a distributed set of memory locations (Wang, Para [0123], discloses:  “Based on the same concept as the above method, a storage medium (which can be a non-volatile machine readable storage medium) is provided according to some embodiments of the present disclosure. The storage medium stores a computer program for neural network pruning. The computer program includes codes configured to: determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.”  Here, Wang discloses that the program to perform these operations is on a computer readable storage medium, which is a distributed set of memory locations.)
and obtaining the priority value and the weight tensor from a memory location in the distributed set of memory locations using a single address. (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.” Here, Wang discloses obtaining from a memory location with a single address, a subset (which can be one element) of execution data, the execution data being one (“for each”) activation’s importance value, and also form a single memory location getting the single “connecting weight” associated with the activation value.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 15, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 13.  Wang teaches storing the markup at a single memory location (Wang, Para [0123], discloses:  “Based on the same concept as the above method, a storage medium (which can be a non-volatile machine readable storage medium) is provided according to some embodiments of the present disclosure. The storage medium stores a computer program for neural network pruning. The computer program includes codes configured to: determine importance values of neurons in a network layer to be pruned based on activation values of the neurons.”  Here, Wang discloses that the program to perform these operations is on a computer readable storage medium, which is at least one single memory location.)
wherein the conditioning of the execution of the directed graph further comprises: obtaining the markup from the single memory location; obtaining a first subset of the set of weights from memory; (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.” Here, Wang discloses obtaining from a memory location with a single address, a subset (which can be one element) of execution data, the execution data being one (“for each”) activation’s importance value, and also form a single memory location getting the single (subset of one) “connecting weight” associated with the activation value.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Tome and Wang for at least the reasons recited in Claim 1.
However, Wang does not explicitly teach and wherein the first subset is selected using the markup.  
Liu teaches and wherein the first subset is selected using the markup.  (Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”  Here, Liu discloses executing the neural network conditionally based on “a control module whose output is a decision that controls whether other modules can execute” as stated on Page 1.  Recall above that Wang disclosed markup which comprises importance scores for each node.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 16, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 13.  Wang teaches weight tensor based on the priority value  (Wang, Para [0010], discloses:  “With the method for neural network pruning according to the embodiment of the present disclosure, first, for each neuron in a network layer to be pruned, an importance value of the neuron is determined based on an activation value of the neuron and a diversity value of the neuron based on connecting weights between the neuron and neurons in a next network layer.” Here, Wang discloses an importance value for the activation value and associated weights.)
However, Wang does not teach wherein the conditioning of the execution of the directed graph further comprises: reducing an accuracy of a computation using the weight tensor based on the priority value.
Liu teaches wherein the conditioning of the execution of the directed graph further comprises: reducing an accuracy of a computation using the weight tensor based on the priority value. (Liu, Page 6 Top Left Paragraph, discloses:  “The control node Q makes choices between a high-capacity node N2 and a low-capacity node N3; the low-capacity node has fewer neurons and uses less computation (please see the detailed configurations of each node in the Appendix). The intuition is that the control Q node can save computation by choosing the low-capacity node for easy examples.”  Here, Liu teaches that the conditional execution may choose a lower accuracy route through the directed graph.  Recall that Wang disclosed a priority value, and therefore in combination, Liu and Wang suggest reducing accuracy of a computation using the weight tensor based on the priority value.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claim 17, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 13 as well as conditioning of the execution (see Liu in Rejection to Claim 1) and markup (see Wang in Rejection to Claim 13).  Tome teaches wherein the conditioning of the execution of the directed graph further comprises: obtaining a first subset of weights from the set of weights from memory; replacing a set of original values of a second subset of the set of weights with a set of replacement values;  (Tome, Page 3 Section IV First Bullet, discloses: “Scalar quantization: each CNN layer is compressed individually. All the weight values in the layer parameters are clustered using the k-means algorithm, where the number of centroids is chosen as a function of the compression factor.”  Here, Tome discloses the weights being replaced by being clustered into centroids.  When combined with the conditional execution of Liu, Tome suggests conditionally replacing weights with quantized values.)
However, Tome does not teach and wherein the first subset of weights is selected using the markup.  
Liu teaches and wherein the first subset of weights is selected using the markup.  (Liu, Page 1 Intro Para 1, discloses:  “That is, given an input, only a subset of neurons are executed, and the particular subset is determined by the network itself and dependent on the particular input.”  Here, Liu discloses executing the neural network conditionally based on “a control module whose output is a decision that controls whether other modules can execute” as stated on Page 1.  Recall above that Wang disclosed markup which comprises importance scores for each node.)
It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Liu with Tome and Wang for at least the reasons recited in Claim 1.

As per Claims 19-23, these are system claims corresponding to method claims 1, 3, 13, 14, and 12 respectively, and are rejected for similar reasons.

As per Claims 24-30, these are computer-implemented method claims corresponding to Claims 1, 2, 3, 6, 8, 13, and 16, respectively.  One difference is that these are directed to a neural network instead of a directed graph.  Examiner points out that a neural network is a directed graph, and notes that Tome Page 5 Conclusion, discloses:  “In this paper we present a detailed study of the effect of neural network compression.”  The other difference is the additional limitation wherein the collection of execution data is obtained and stored orthogonally to a main data flow of the neural network.  Wang, Para [0049], discloses that the execution data is “variance”:  “where qil denotes the variance of the activation value vector for the i-th neuron in the network layer to be pruned, and Ql denotes the neuron variance importance vector for the network layer to be pruned.”  Examiner notes that this is not the same as the values of the weights, or information gathered from training, but it “orthogonal” data that is distinct from the values of the neural network itself, and are instead a sort of metadata describing the cumulative results of sending inputs through the network.  Therefore, Claims 24-30 are rejected for the same reasons as Claims 1, 2, 3, 6, 8, 13, and 16, respectively.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Tome, Wang, and Liu, further in view of Mahmoudzadeh et al. (“Evaluation of Interpolation Effects on Upsampling and Accuracy of Cost Functions-Based Optimized Automatic Image Registration”; hereinafter “Mahmoudzadeh”).
As per Claim 7, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 6.  However, the combination of Tome, Wang, and Liu does not teach wherein: the down-sampling of the directed graph utilizes polynomial interpolation.
Mahmoudzadeh teaches wherein: the down-sampling of the directed graph utilizes polynomial interpolation (Mahmoudzadeh, 1.1.4, “B-spline interpolation uses weighted voxel values in a wider neighborhood compared to trilinear interpolation, but both the B-spline and trilinear kernels are symmetrical and separable. The place of the neighboring points as control points relates to B-spline interpolation and combines the intensity values at these places using a set of polynomial basis according to (5) [16].
Equation (5) shows k-order B-spline with n + 1 control points (P1, P2,…, Pn),
P(t)=∑n+1i=1Ni,kPi, tmin≤t<tmax. (5)
In (5), N i,k are the polynomial functions of order k (degree k − 1), and n is the number of control points; k must be at least 2 (linear) and less than n + 1.
P(t) is validly defined for t min≤ t < t max where t min= t k and t max= t n+2. A knot vector (t 1, t 2,…, t k+(n+1)) must be determined. This specifies the values of t at which the pieces of curve join, like knots joining bits of string. It is important to note that the degree of the weighting polynomial (the order of the curve) is not dependent on the number of control points, n [17]. The weighting polynomial can be recursively defined by the following equation [18]”. Examiner Note: When combined into Liu and Park’s neural network approximation system, Mahmoudzadeh’s polynomial interpolation method would result in the down-sampling of a directed graph utilizing polynomial interpolation.).
Mahmoudzadeh and the combination of Tome, Wang, and Liu are analogous art because they are directed towards enhanced data processing methods. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Tome, Wang, and Liu’s neural network simplification with Mahmoudzadeh’s polynomial interpolation method. The modification would have been obvious to one of ordinary skill in the art because they would have been motivated to reduce computational demand of the resulting neural network while maintaining acceptable accuracy, which can be accomplished through a decrease of resolution via polynomial interpolation (Mahmoudzadeh, abstract).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Tome, Wang, and Liu, further in view of Yao et al. (US 2018/0046894 A1; hereinafter “Yao”).
As per Claim 9, the combination of Tome, Wang, and Liu teaches the computer-implemented method from claim 8.  However, the combination of Tome, Wang, and Liu does not teach wherein the replacing comprises one of: reducing a number of bits used to represent the set of original values to obtain the set of replacement values; and calculating the set of replacement values using a set of exponents of the set of original values.
Yao teaches wherein the replacing comprises one of: reducing a number of bits used to represent the set of original values to obtain the set of replacement values (Yao, “Using short fixed-point numbers instead of long floating-point numbers is efficient for implementations on the FPGA platform and can significantly reduce memory footprint and bandwidth requirements. A shorter bit width is always wanted, but it may lead to a severe accuracy loss. Though fixed-point numbers have been widely used in ANN accelerator designs, there is no comprehensive investigation on different quantization strategies and the tradeoff between the bit length of fixed-point numbers and the accuracy” [0096].); and
calculating the set of replacement values using a set of exponents of the set of original values (Yao, “For a fixed-point number, its value can be expressed as (9), where bw is the bit width of the number and f¬l is the fractional length which can be negative.” [0098-0099]. Equation 9).
Yao and the combination of Tome, Wang, and Liu are analogous art because they relate to neural network efficiency improvements. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Tome, Wang, and Liu’s simplified neural network with Yao’s neural network compression. The modification would have been obvious to one of ordinary skill in the art because they would have been motivated to reduce the computational demand of neural networks, which can be accomplished by Yao’s quantization (Yao, abstract, [0002]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Yao et al. (US 2019/0188567 A1), discloses in [0021]:  “As described above, deep neural networks (DNNs) may include many layers and many parameters that place a heavy burden on devices during implementation. As is discussed herein, a trained (e.g., pre-trained) deep neural network model having full connectivity, convolutional layers, fully connected layers, or the like between available connections and weights or parameters for each of such connections, convolutional layers, fully connected layers, or the like may be received for compression. For example, such DNNs may be characterized as dense DNNs. The compression discussed herein may include iterative pruning and splicing operations and parameter weight update operations. Such pruning operations (e.g., disconnecting an available connection at a particular iteration) may compress the DNN model by removing unimportant connections and such splicing operations (e.g., reconnecting previously disconnected available connections at a particular iteration) may provide recovery for pruned connections that are found to be important over the iterations. Such techniques provide dynamic network surgery for learning lossless highly sparse DNNs. Such techniques may be performed on the fly to compress a pre-trained (i.e., fully trained) DNN model.”  Here, Yao discloses that one may prune, and then reconnect (“splice”) a NN, meaning that one could subsequently perform inference on the original model.  Yao also discloses pruning “unimportant” connections.
Rabinovich et al. (US 2017/0262737 A1) discloses in [0042]:  “This approach can be taken to modify and improve the architecture of any off-the-shelf convolutional neural network. By following the inventive approach of the present disclosure, any neural network can be improved by (a) identifying the information gain bottleneck in its structure, (b) applying the structure of the predictions to alleviate the bottleneck, and finally (c) determining the depth of specialists pathways,” and also in [0068]:  “Learning is performed to determine whether or not to use a computation block.”  Thus, Rabinovich discloses determining which parts of a NN to execute based on information gain from that part of the NN
Feng et al. (“Learning The Structure of Deep Convolutional Networks”), discloses in the Abstract:  “The ibpCNN automatically adapts its structure to provided training data, achieves an optimal balance among model complexity, data fidelity and training loss, and thus offers better generalization performance”, and also discloses on Page 2750 Para 2:  “In this work, we propose a novel Grow-And-Prune (GAP) algorithm to optimize the structure of each IBP layer in ibpCNN, during the training process. The GAP algorithm conducts complementary model growing and pruning to find optimal layer configuration efficiently, and is guaranteed to converge to the optimal layer configuration.”  Thus, Feng determines where to use the full or expanded model, and where to prune the model, in an adaptive fashion.
Ko (“Adaptive weight compression for memory-efficient neural networks”), Abstract, discloses “This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and smoothness of the weight matrix. To minimize the loss of accuracy due to JPEG encoding, we propose to adaptively control the quantization factor of the JPEG algorithm depending on the error-sensitivity (gradient) of each weight”.  Here, Ko discloses compressing weights by a quantization factor depending on the sensitivity of each weight.
Han (“Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”), Abstract, discloses:  “To address this limitation, we introduce “deep compression”, a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35to 49without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding.”  Here, Han discloses combining pruning and quantizing to reduce the resource usage of a neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/VIKER A LAMARDO/Primary Examiner, Art Unit 2126