Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Claims 21-22 invoke § 112(f). In these claims, the limitation of “a processing mechanism…wherein during operation, the processing mechanism: receives… obtains… compiles” recited in claim 21 invokes § 112(f). 
Here, the term uses a substitute for "means" that is a generic placeholder term Specifically, “mechanism” is a generic placeholder. See MPEP § 2181(I)(A): “The following is a list of non-structural generic placeholders that may invoke 35 U.S.C. 112(f): ‘mechanism for,’…” 
The generic placeholder “mechanism” is modified by functional language, which include “wherein during operation, the processing mechanism: receives… obtains… compiles.” Therefore, while the above term does not use the specific “means for (function)” form, its meaning is equivalent to, e.g., a processing means for receiving, obtaining, and compiling during operation.
Furthermore, the generic placeholder is not “modified by sufficient structure, material, or acts for performing the claimed function” (MPEP § 2181(I)). Here, the limitations of “that executes on the at least one processor” is not a modifier that recites sufficient structure, material, or acts for performing the claimed function because the role of the “processor” in relation to the recited operations is not recited in the claim. In particular, it is unclear if the “processing mechanism” is entirely software code that is executed by the processor, or is instead a hardware subcomponent of the processor. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 21-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

In claim 21, the limitation “a processing mechanism…wherein during operation, the processing mechanism: receives… obtains… compiles” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. 
Here, the specification does not disclose what structure constitutes the “processing mechanism” of the claim limitation recited above. The term “processing mechanism” is used only in the claims and does not appear in the specification. Thus, in this case, the specification does not disclose what structure constitutes a “processing mechanism.”
Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. 
Dependent claim 22 is also rejected for the same reasons given above because it incorporates the limitations of claim 21 without curing the deficiencies thereof.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-6, 8-9, 11-16, 18-19, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Hostetler, “Toward Runtime-Throttleable Neural Networks,” arXiv:1905.13179v1 [cs.LG] 30 May 2019 (“Hostetler”) in view of Rotem et al., “Glow: Graph Lowering Compiler Techniques for Neural Networks,” arXiv:1805.00907v3 [cs.PL] 3 Apr 2019 (“Rotem”).
As to claim 1, Hostetler teaches a method for facilitating dynamic runtime execution of a deep neural network (DNN), comprising: [Abstract: “As deep neural network (NN) methods have matured, there has been increasing interest in deploying NN solutions to ‘edge computing’ platforms…This paper presents an approach to creating runtime-throttleable NNs that can adaptively balance performance and resource use in response to a control signal.” § 6, paragraph 1: “performance can be varied dynamically to produce a range of trade-offs between task performance and resource consumption.”]
receiving a model, a set of weights [§ 3.3, paragraph 1: “To examine the generality of our TNN concept, we created throttleable versions of several popular CNN architectures, summarized in Table 1.” That is, the CNN models shown in Table 1 read on the limitation of “a model.” The limitation of “a set of weights” is disclosed because the models shown in Table 1 are neural network models, which therefore have weights. See, e.g., § 3.2, paragraph 2: “In convolution layers, the blocks are sets of convolutional filters; in fully-connected layers, the blocks are sets of neurons.” (Note that “convolutional filters” are weights in a convolutional neural network). See also § 5.2, paragraph 1: “For ImageNet, we used pre-trained weights to initialize the data path, then fine-tuned the weights with gating.”] and runtime metadata for the DNN; [§ 3.1, paragraph 1: “fθ… is a vector of components with parameters θ, gψ… is the gating function with parameters ψ…” Algorithm 1 (on page 6), the parameters θ and ψ are trained, as described in § 4, last paragraph teaches: “In the first phase, we train the ‘data path’ with random gating to optimize only L while being ‘compatible’ with gating. In the second phase, we train the gating controller to optimize the full objective J while keeping the data path fixed.” That is, f and g define a data path and gating function. Thus, f (including the data path parameters θ) and the parameter ψ of the gate controller corresponds to “runtime metadata for the DNN” in the context of throttling operations.]
obtaining code to perform inference-processing operations for the DNN; [§ 5 teaches experiments that “compare different approaches to creating TNNs using gating in image classification and object detection task.” See captions of FIGS. 3 and 4. See also Appendix A: “All of our experiments are implemented using the PyTorch library, versions 0.3.1, 0.4, and 1.0, and run on Nvidia GTX 1080 Ti and RTX 2080 Ti GPUs.”  Since the reference teaches the execution of the neural network using a Python library, and the execution is performed on a computer, the instant limitation, which reads on the act of obtaining computer executable instructions, is implied by the cited reference.] and
[…] a runtime engine that facilitates throttling operations during execution of the inference-processing operations, [Abstract: This paper presents an approach to creating runtime-throttleable NNs that can adaptively balance performance and resource use in response to a control signal. Throttleable networks allow intelligent resource management, for example by allocating fewer resources in ‘easy’ conditions or when battery power is low.” The acts of resource management are considered to be “throttling operations.” As to the limitation of “runtime engine,” this limitation is implicitly disclosed because the neural network is run with the “gate controller” described in § 4.2 to implement throttling. Furthermore, appendix A teaches: “All of our experiments are implemented using the PyTorch library,” implying that the gate controller is run in a runtime environment that uses the PyTorch library. Thus, the learned gate controller is be regarded as a “runtime engine.”] wherein the runtime engine conserves computing resources by selecting portions of the inference-processing operations to execute based on the runtime metadata. [§ 4, paragraphs 1-2: “The goal of training a throttleable network is to create a model that varies its complexity in response to the control parameter u.…the control parameter u determines the number of gated blocks that should be used, and the choice of which blocks to turn on is made according to a fixed rule.” Note that the “blocks” refer to “blocks of neurons,” as described in § 3, paragraph 1. That is, the blocks to turn on correspond to “portions of the inference-processing operations to execute.”]
Hostetler does not explicitly teach the limitation of “compiling code to implement” the runtime engine.
Rotem, in an analogous art, teaches the above limitation. Rotem relates to “graph lowering compiler techniques for neural networks” (title), and is therefore in the same field of endeavor as the claimed invention, namely machine learning. Rotem generally pertains to 
“a machine learning compiler for heterogeneous hardware,” known as “Glow” (see abstract, first sentence).
	In particular, Rotem teaches “compiling code to implement” a runtime engine [Abstract: “This paper presents the design of Glow, a machine learning compiler for heterogeneous hardware.” § 2.1: “Glow is designed to consume a neural network compute graph, optimize it, and code generate for it for a diverse set of backends in a more scalable way.” § 5, paragraph 1: “The Glow CPU backend compiles the low-level intermediate representation into an optimized stream of instructions.” That is, the compiler compiles the model (see § 6, paragraph 1) so that it can be executed. Since a model may include a runtime engine (note that Hostetler already teaches that a “model” may include the throttling controller which corresponds to the “runtime engine”), Rotem addresses the context of compiling code to implement a runtime engine.]
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Hostetler with the teachings of Rotem by modifying Hostetler to include the operation of compiling code to implement the runtime engine of Hostetler. The motivation would have been to utilize a compiler to execute code more efficiently, as suggested by Rotem, § 1, paragraph 2: “machine learning frameworks have started to hand over the graph to compilers that execute code more efficiently.”	
As to claim 2, the combination of Hostetler and Rotem teaches the method of claim 1, wherein during the throttling operations, the runtime engine identifies portions of the DNN to mask out and/or selects portions of the DNN to traverse [Hostetler, § 6: “We instantiated throttleable NNs using gated networks composed of many smaller components that can be turned on and off.” § 4, paragraph 2: “the control parameter u determines the number of gated blocks that should be used, and the choice of which blocks to turn on is made.” That is, a component that is turned off is masked out, while a component that is turned on is traversed.], but Hosteler as modified thus far does not teach the remaining limitations of “based on a graph analysis involving the model, the set of weights and the runtime metadata.” 
Rotem further teaches “based on a graph analysis involving the model, the set of weights and the runtime metadata.” [§ 6.1 (“Glow Runtime Components”), paragraph 4: “Figure 9 shows a simple example of a partitioned, directed graph converted into a schedule.” As shown in FIG. 9, caption: “A simple example showing a graph partitioned into multiple sub-graphs, themselves making up a directed graph, and then converted into a schedule by the executor.” That is, the executor, which is part of the runtime components that are analogous to a runtime engine, creates a schedule based on the partitioned graph. Note that the original graph is a model with a set of weights, as described in § 6.1, paragraph 1: “Depending on each accelerator’s available memory and the size of the weights of a model, we may want or need to partition an input network into sub-graphs across multiple accelerators in order to saturate each accelerator.” It is noted that the limitation of “runtime metadata” is already taught by Hostetler as being part of the model. Since Rotem teaches a runtime engine in the form of the runtime components, Rotem’s teachings are compatible with the runtime meta of Hostetler, and Rotem is also deemed to teach additional runtime metadata in the form of the graph shown in FIG. 9. Moreover, the runtime schedule in Rotem has the function of identifying “portions of the DNN to mask out and/or selects portions of the DNN to traverse,” since it is used to execute the neural network model.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into combination of Hostetler and Rotem thus far, the above further teachings of Rotem by modifying the runtime engine so that it identifies portions of the DNN to mask out and/or selects portions of the DNN to traverse “based on a graph analysis involving the model, the set of weights and the runtime metadata.” The motivation would have been to implement a runtime engine that is capable of partitioning models, queueing requests, and executing models across multiple devices. (Rotem, § 6, paragraph 1: “Glow provides a runtime that is capable of partitioning models, queueing requests, and executing models across multiple devices. It provides a host level abstraction for compiling and loading models and handling concurrent inference requests on all those models.”).

As to claim 3, the combination of Hostetler and Rotem teaches the method of claim 2, wherein the graph analysis involves scheduling data-fetching operations based on the runtime metadata to facilitate execution of the DNN. [Rotem, § 6.1 (“Glow Runtime Components”), paragraph 4: “Figure 9 shows a simple example of a partitioned, directed graph converted into a schedule.” As shown in FIG. 9, caption: “A simple example showing a graph partitioned into multiple sub-graphs, themselves making up a directed graph, and then converted into a schedule by the executor.” Note that the scheduling, as shown in FIG. 9, involves input operations (i.e., data-fetching operations) based on the structure of the model (Rotem, § 6, paragraph 1: “Glow provides a runtime that is capable of partitioning models, queueing requests, and executing models across multiple devices.). The runtime component is used to execute the model. Thus, the limitation of “to facilitate execution of the DNN” is taught.] 

As to claim 4, the combination of Hostetler and Rotem teaches the method of claim 1, wherein the runtime metadata comprises information about statistically relevant execution paths through the DNN, [Hostetler, § 4, last paragraph teaches: “In the first phase, we train the ‘data path’ with random gating to optimize only L while being ‘compatible’ with gating. In the second phase, we train the gating controller to optimize the full objective J while keeping the data path fixed.” Hostetler, FIG. 1(b) illustrates examples of data paths: “This paper considers performance throttling via selective gating. This figure shows different ways of organizing gated components – widthwise vs. depthwise gating, and independent vs. nested ordering” (caption). Note that the data path is used in the gating module which has the form “y” (see equation (2) in text in § 3.1). See also § 2.1: “each layer has multiple data paths, and a ‘Composer’ module chooses which path to take in each layer. The Composer takes a control parameter as input and its loss function penalizes complexity weighted by the control signal. We describe a broader TNN framework that subsumes the model of Odena et al. (2017).”] which are determined based on activations in the DNN and associated sub-tensors. [Hostetler, § 4, last paragraph teaches: “In the first phase, we train the ‘data path’ … In the second phase, we train the gating controller to optimize the full objective J while keeping the data path fixed.” As shown in Algorithm 1, the data path and gate controller are trained, and this training is based on the neural network model. Note that the gating module is trained based on the activation of the DNNs, since the DNNs in Hostetler are CNNs of known model types such as VGG, ResNeXt-w, ResNet-D, and DenseNet, as shown in Table 1, which are well known models that have activation functions. Moreover, the use of activation is described in § 4.2, paragraph 1: “our task is to learn the function pψ giving the activation probabilities of each component”; paragraph 3: “Relaxation approaches soften the discrete gate vector into a continuous vector of ’activation strengths.’” With respect to the limitation of “associated sub-tensors,” weight matrices are taught in the form of “convolutional filters” as discussed in the rejection of claim 1. Any sub-portion of the filter may be regarded as a “sub-tensor,” noting that the instant claim language does not require a particular sub-tensor to be functional distinguished from other sub-tensors or the full tensor; thus, any tensor, such as a weight matrix, is considered to include sub-tensors.] 

As to claim 5, the combination of Hostetler and Rotem teaches the method of claim 1, wherein the runtime metadata specifies runtime masks to facilitate selectively executing inference-processing operations for the DNN. [In Hostetler, the gating function, which is specified by the data path parameters and the gate controller parameters (which correspond to the metadata, as discussed in the rejection of claim 1), constitutes one or more masks to facilitate selectively executing inference-processing operations. See Hostetler, § 3.1 which teaches the gating function g(x,u) and the normalized gating function in equation (3). The gating function selectively execute certain blocks of the neural network in inference processing operations. See Hostetler, § 4, paragraph 2: “the control parameter u determines the number of gated blocks that should be used, and the choice of which blocks to turn on.”] 

As to claim 6, the combination of Hostetler and Rotem teaches the method of claim 5, wherein the runtime metadata specifies different masks for different DNN outputs. [As noted in the rejections above, in Hostetler, the metadata specifies “which path to take in each layer” (§ 2, paragraph 1). For example, Hostetler, § 1.1, paragraph 3, teaches that “When gi = 0, the component fi is effectively disabled.” Since there is a plurality of components (§ 3.1, paragraph 1: “vector of components”) and corresponding gating functions, as indicated by equation (2) in § 3.1, there is a plurality of masks (i.e., functions that enable different blocks to be turned on or off). In regards to the limitation of “DNN outputs,” the instant claim language does not specify what those outputs are. Hostetler, § 3.2, paragraph 2, teaches that “blocks are sets of neurons.” Thus, noting that “outputs” broadly reads on intermediate outputs within a neural network, each block has its own different, respective set of outputs.]

As to claim 8, the combination of Hostetler and Rotem teaches the method of claim 1, wherein the runtime engine is configured to determine a current runtime state and a target runtime state for the DNN, [Hostetler, § 2: “The “complexity loss,” C, measures the resources used – energy, CPU time, etc. – when the network processes example x at ‘effort level’ u, and λ controls the balance of the two losses.” Furthermore, § 4, paragraphs 1-3 teaches: “The natural measure of complexity is the number of active components…c(g)… We enforce the constraint that the actual complexity c(g) should not exceed the target complexity u.” Thus, in equations (5) and (6) in § 4, “u” (the target complexity) corresponds to a target runtime state, and c(g) corresponds to a current runtime state.] wherein the throttling operations select an operational plan to achieve the target runtime state from the current runtime state. [As discussed in Hostetler, § 2, the network minimizes a loss that includes the loss C(x,u), which is based on the difference between the target complexity u and the actual complexity c(g). See also Appendix A, paragraph 1, which teaches an experiment run at different values of u. In regards to the “operational plan,” this limitation is taught by Hostetler, § 4, paragraph 1: “to create a model that varies its complexity in response to the control parameter u.”]

As to claim 9, the combination of Hostetler and Rotem teaches the method of claim 8, but Hostetler as modified thus far does not teach the further limitation of the instant claim. 
Rotem further teaches wherein the current runtime state comprises a current execution context and a current input context, [§ 6.1, paragraph 3: “The Executor handles the execution of a network. It tracks each sub-network’s execution state and propagates sub-network inputs and outputs. The Executor is responsible for asynchronously handing incoming inference requests for a network and returning the collated results. Figure 9 shows a simple example of a partitioned, directed graph converted into a schedule.” That is, the variables being handled by the executor in execution state corresponds to an execution context, and the inputs and outputs that are propagated correspond to an input context. ] wherein the current execution context comprises current activations and/or outputs for the DNN, and the current input context comprises contextual features associated with current inputs to the DNN. [§ 6.2: “3. The DeviceManager loads inputs onto the card and begins execution. When done, it reads outputs and signals completion. 4. The Executor triggers any sub-networks with satisfied dependencies.” Note that the outputs referred to here are outputs from other parts of the overall network that will be used as inputs for “sub-networks with satisfied dependencies.” The dependencies here refer to the dependencies shown in the schedule and graph of FIG. 9. Thus, the schedule and the graph dependencies may be regarded as “contextual features associated with current inputs.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated, into combination of Hostetler and Rotem thus far, the above further teachings of Rotem by modifying the runtime engine to include the features taught by Rotem such that “the current runtime state comprises a current execution context and a current input context, wherein the current execution context comprises current activations and/or outputs for the DNN, and the current input context comprises contextual features associated with current inputs to the DNN.” The motivation would have been to implement a runtime engine functionalities that are capable of partitioning models, queueing requests, and executing models across multiple devices. (Rotem, § 6, paragraph 1: “Glow provides a runtime that is capable of partitioning models, queueing requests, and executing models across multiple devices. It provides a host level abstraction for compiling and loading models and handling concurrent inference requests on all those models.”).

As to claims 11-16, 18-19, these claims are directed to a computer-readable storage medium for performing operations that are the same or substantially the same as those recited in claims 1-6 and 8-9, respectively. Therefore, the rejections made to claims 1-6 and 8-9 are applied to claims 11-16 and 18-19, respectively.
Additionally, Hostetler teaches “a non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for facilitating dynamic runtime execution of a deep neural network (DNN)” [Appendix A, paragraph 1: “All of our experiments are implemented using the PyTorch library…and run on Nvidia GTX 1080 Ti and RTX 2080 Ti GPUs.” Note that these GPUs are used together with a computer that stores the PyTorch library. Since Hostetler teaches that its method is performed on a computer, the instant limitations of a non-transitory computer-readable storage medium storing computer-executable instruction is implicitly disclosed.] 

As to claims 21-22, these claims are directed to a system for performing operations that are the same or substantially the same as those recited in claims 1-2, respectively. Therefore, the rejections made to claims 1-2 are applied to claims 21-22, respectively.
Additionally, Hostetler teaches “a system that facilitates dynamic runtime execution of a deep neural network (DNN), comprising: at least one processor and at least one associated memory; and a processing mechanism that executes on the at least one processor, wherein during operation.” [Appendix A, paragraph 1: “All of our experiments are implemented using the PyTorch library…and run on Nvidia GTX 1080 Ti and RTX 2080 Ti GPUs.” Note that these GPUs are used together with a computer that stores the PyTorch library. Since Hostetler teaches that its method is performed on a computer, the instant limitations of a processor, memory, and a processing mechanism in the form of executed software.]

2.	Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Hostetler  in view of Rotem and further in view of Uchida et al., “Embedding Watermarks into Deep Neural Networks,” In Proceedings of ICMR ’17, June 6–9, 2017, Bucharest, Romania, 9 pages (“Uchida”). 
As to claim 7, the combination of Hostetler and Rotem teaches the method of claim 1, but does not teach the further limitation that the runtime engine is configured to “cryptographically decode a watermark pattern encoded in the set of weights to facilitate validating the DNN.” 
Uchida, in an analogous art, teaches the above limitations. Uchida generally relates to “embedding watermarks into deep neural networks” (see title), and is therefore in the field of machine learning.
In particular, Uchida teaches “wherein the runtime engine is configured to cryptographically decode a watermark pattern encoded in the set of weights to facilitate validating the DNN.” [Abstract: “we propose a general framework for embedding a watermark in model parameters, using a parameter regularizer.” Note that “parameters” in this context refers to “weights.” See, e.g., § 2, which refers to “a model network with or without trained parameters.” The act of detecting (decoding) the watermark is described in § 3.3: “In this section we discuss the design of the embedding parameter X, which can be considered as a secret key in detecting and embedding watermarks.” See also § 4.2.2 (“Detecting Watermarks”). The detection relies on the parameter X, which is the secret key. Therefore, the detection is considered to be a cryptographical decode operation. Successful detection is also considered to be validation to the extent required by the instant claim language.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Hostetler and Rotem with the teachings of Uchida by modifying the runtime engine to be further configured to “cryptographically decode a watermark pattern encoded in the set of weights to facilitate validating the DNN.” The motivation would have been to “utilize digital watermarking technology, which is used to identify ownership of the copyright of digital content such as images, audio, and videos.” (Uchida, § I, paragraph 3).

As to claim 17, this claim recites further limitations that are the same or substantially the same as those recited in claim 7. Therefore, the rejection made to claim 7 is applied to claim 17. 

3.	Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hostetler  in view of Rotem and further in view of Pal et al. (US 2019/0348999 A1) (“Pal”). 
As to claim 10, the combination of Hostetler and Rotem teaches the method of claim 1, but does not teach the further limitation that the runtime engine is further configured to “decode the set of weights during the inference-processing operations based on dictionary index values in the runtime metadata.”
Pal, in an analogous art, teaches the above limitation. Pal generally pertains to a method for “data compression and decompression” ([0001]), with specific application to deep neural networks (see [0003]). Therefore, Pal is in the same field of endeavor as the claimed invention, namely machine learning.
In particular, Pal teaches “decode the set of weights during the inference-processing operations based on dictionary index values in the runtime metadata.” [[0088]: “FIG. 7 illustrates a block diagram 700 of a decompression technique applied in neural networks…At 718, after having compressed binary file as an input, an address dictionary is created which contains the starting address, offset of each layer/block (containing a single or multiple layers) of the neural network. The address dictionary also contains the data resolution information of the individual layers (bias, kernel weights etc.,) At 704, when the system demands to fetch a particular layer, it sends the beginning address information and the corresponding layer is fetched from the compressed binary file. At 706 and 712, layer 3 and layer 2 are fetched from the compressed binary file 702 using the address dictionary 718….At 706 and 712, it is shown that the corresponding layers of the corresponding neural net model weight file are fetched….At 710, a decompressed file is generated using the Huffman Decoding algorithm where the weights are decoded in single precision 32-bit (IEEE 754) format corresponding to the layer 3.” That is, the “beginning address information” corresponds to dictionary index values, and the serves as the basis for the subsequent decoding of the weights. Note that the limitations of “the inference-processing operations” and “runtime metadata” are already taught by the current combination of references, and that Pal has been cited for the decoding method and address dictionary.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Hostetler and Rotem with the teachings of Pal by incorporating the decoding method and address dictionary of Pal into the inference-processing operations and the runtime metadata of Hostetler as modified by Rotem, and by modifying the runtime engine so as to be further configured to “decode the set of weights during the inference-processing operations based on dictionary index values in the runtime metadata.” The motivation would have been to perform decompression of neural network weights as part of a compression and decompression technique that facilitates efficient distributed training, easy embedded platform development and model upgradation, as suggested by Pal, [0002] (“Compressing a neural net is therefore necessary from the point of view of efficient distributed training, easy embedded platform development and model upgradation by exporting to the client over network etc.”).

As to claim 20, this claim recites further limitations that are the same or substantially the same as those recited in claim 10. Therefore, the rejection made to claim 10 is applied to claim 20. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following documents depict the state of the art.
McBride et al., US 2018/0299943 A1 teaches “a DNN module [that] can maintain one or more bandwidth throttling mechanisms” (abstract).
Minkin et al., US10664282B1 teaches runtime augmentation of engine instructions, with particular applications to neural network.
Rossi et al., US20200082273A1, teaches compiling neural networks.
Simonyan et al., “Very Deep Convolutional Neural Networks for Large-Scale Image Recognition,” arXiv:1409.1556v6 [cs.CV] 10 Apr 2015, teaches the “VGG” neural network discussed in Hostetler, including the use of weights and activations.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124