DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/12/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim language is unclear:
As per claim 1, line 1 recites “scalable enclave protection” it is unclear from the context of the claim what constitutes as scalable enclave protection. While the claim describes the steps of tailoring at least a machine learning program, the claim does not explicitly recite how enclave protection is scaled. Does tailoring at least one ML program affects the enclave protection? 
As per claim 1, lines 5-7 recite “allocating a shared memory for computing a plurality of layers of a neural network, the shared memory reducing total memory usage during the computation of the plurality of layers” it is unclear from the context of the claim how the allocation of a shared memory reduces the total memory usage during computation a plurality of layers.
As per claim 1, lines 10-11 recite “addressing memory usage dependencies of the layers using inter-layer dependency resolution” it is unclear what constitutes inter-layer dependency resolution. For examination purposes, examiner interprets the limitation as detecting data dependencies and resolving them to be non-dependent. MPEP 2172.01 states “If a claim fails to interrelate essential elements of the invention as defined by applicant(s) in the specification, the claim may be rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as indefinite.”
As per claim 1, lines 12-14 recite “partitioning computation of any high memory usage layers into multiple sessions using intra-layer computation partitioning” It is unclear what a session is and what constitutes the “intra-layer computation partitioning used for partitioning”. For examination purposes, examiner interprets the limitation as partitioning the neural network layers into a sequence of superlayers based on the threshold storage capacity of the memory.
As per claim 1, line 14 recites “a threshold memory usage” it is unclear whether the threshold corresponds to a threshold of a layer or a threshold of the at least one enclave in which the tailored ML will execute. For examination purposes, examiner interprets the limitation as the limit of the enclave/hosting environment.
As per claim 5, lines 1-2 recites “scheduling the at least one enclave into at least one processor such that memory usage does not exceed a memory budget” it is unclear whether the memory usage corresponds to the tailored ML or a size of the enclave and whether the memory budget corresponds to a hardware memory limit. 
As per claim 6, it is unclear whether the new ML program execution request correspond to executing a different ML than the original ML (pre-tailored version). Further, does the “launching the at least one enclave” corresponds to the one already executing the tailored ML program of claim 1 or is it a new or different enclave for the new ML program execution request.
Claims 2-7 are dependent on claim 1 and fail to cure the deficiencies above for claim 1. Therefore, they are all rejected under the same rationale.
Regarding claim 8, it is a media/product type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.
Claims 9-14 are dependent on claim 8 and fail to cure the deficiencies above for claim 8. Therefore, they are all rejected under the same rationale.
Regarding claim 15, it is a system type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.
Claims 16-20 are dependent on claim 15 and fail to cure the deficiencies above for claim 15. Therefore, they are all rejected under the same rationale.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Tople et al. PRIVADO: Practical and Secure DNN Inference with Enclaves in further view of Woo (US 10,019,668 B1).

Regarding claim 1, Grover teaches the invention substantially as claimed including a computer-implemented method for efficient and scalable enclave protection for machine learning (ML) programs (Abstract: DNN Inference-as-a-service), comprising: 
tailoring at least one ML program to generate at least one tailored ML program for execution within at least one enclave (Abstract: it transforms any deep learning framework written in C/C++ to be free of input-dependent access patterns. PRIVADO is fully automated and has a low TCB: with zero developer effort, given an ONNX description, it generates compact C code for the model which can run within SGX-enclaves.), including: 
allocating a shared memory for computing a plurality of layers of a neural network, the shared memory reducing total memory usage during the computation of the plurality of layers (4 Sources of Leakage in DNNs: By customizing our solution to the specifics of how memory is accessed in DNN algorithms, we believe the overhead of making DNN inference data-oblivious can be significantly reduced. We make two key observations. First, the vast majority of computations in DNNs involve linear layers (fully connected or convolution layers) that exhibit only deterministic accesses, i.e., the memory access patterns do not vary based on the input. Second, certain types of DNN layers, such as ReLU or max-pool, exhibit input-dependent accesses, i.e., their memory access patterns can vary depending on the input (as we show in Section 3). However, even in layers that exhibit inputdependent accesses, the accesses are of a very specific type: either a given memory location is accessed, or no memory locations are accessed.; 5.3 Privado-Generator: Reducing TCB. PRIVADO-Generator reduces the TCB by trimming the Torch library to only the bare-minimum set of files required to compile a given model. Once it identifies all layers in the ONNX model, it includes and compiles only the necessary Torch files from the math and the NN libraries. This step excludes irrelevant library code and thereby reduces the trusted code-base.); 
loading model parameter data for each of the plurality of layers onto the shared memory on-demand (Introduction: To address the ease-of-use challenge, PRIVADO uses the PRIVADO-Generator which takes as input models represented in the popular ONNX format [17], and automatically generates a minimal set of enclave-specific code and encrypted parameters for the model. There is no custom programming required. The server simply loads the auto-generated code for the model within an enclave where the model parameters are decrypted; Figure 1: The model owner uploads the model binary and the encrypted parameters to an SGX-enabled cloud provider; Table 1 shows the number of parameters per model which are then loaded into the enclave of Table 2 having a max SGX memory of 90 MB); 
addressing memory usage dependencies of the layers using inter-layer dependency resolution (1 Introduction: Privado uses a component we call the Privado-Converter to automatically detect all such data-dependent access patterns in a given deep learning framework. It modifies them to become data-independent using cmov; 5.2 Privado-Converter: PRIVADO-Converter’s first step is to analyze all source-code of these libraries and identify all branches. To do this, it traverses the AST of each function in the library and reports all conditional statements such as if-else, input-dependent loop guards, and ternary operations. PRIVADO then performs an interprocedural data flow analysis to identify all the inputdependent variables [33]. Finally, it then collects all the variables that are involved in each conditional statement and selects only the ones that are input-dependent.); and 
executing the at least one tailored ML program within the at least one enclave (5.3 Privado-Generator: The model owner shares the encryption key over a secure channel to the enclave that executes the model binary.).

Tople discusses pooling layers after convolution layers to reduce the output size at each layer to reduce the output size by 75% but Tople does not expressly teach partitioning computation of any high memory usage layers into multiple sessions using intra-layer computation partitioning, the high memory usage layers including layers having a memory usage higher than a threshold memory usage.

However, Woo teaches partitioning computation of any high memory usage layers into multiple sessions using intra-layer computation partitioning, the high memory usage layers including layers having a memory usage higher than a threshold memory usage (Col. 2, lines 41-46: In some implementations, the memory of the hardware circuit has a threshold storage capacity, and determining the partitioning of the neural network layers into a sequence of superlayers, includes: partitioning the neural network layers into a sequence of superlayers based on the threshold storage capacity of the memory of the hardware circuit. In some implementations, the neural network layers are partitioned into a sequence of superlayers so as to not exceed the threshold storage capacity of the memory when the hardware circuit processes the batch of neural network inputs.; Col. 15, lines 23-36: In some implementations, circuit 100 uses equation 1, the size parameter of the inputs (e.g., in memory units), the batch size, and the aggregate memory used for the parameters to determine a total on-chip memory usage for one or more groups of layers. Circuit 100 can compare the total memory usage for each group of layers to the 500 MB on-chip storage capacity. Circuit 100 can then determine a partitioning or grouping of layers that form a sequence of superlayers based on the results of the comparison. Circuit 100 determines the partitioning of the layers into a sequence of superlayers so as to not exceed the threshold storage capacity of the on-chip memory (500 MB) when a hardware circuit processes a batch of neural network inputs for the working sets.).
	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Woo with the teachings of Tople to evaluate memory use of layers to determine whether to split to avoid a threshold as discussed by Woo (500MB) and Tople (90MB). The modification would have been motivated by the desire of not exceeding threshold storage capacity.

Regarding claim 2, Tople teaches further comprising profiling memory usage to generate memory profile results for tailoring the at least one ML program, including: profiling I/O memory by analyzing how the at least one ML program uses I/O memory buffers and profiling weight memory by analyzing how the at least one ML program uses weight memory buffers (Table 2; Column “Overhead in %”; 6.4 Performance, See at least “SGX-Enclaves improve efficiency for models that fit entirely in SGX memory” and “Page-faults cause most of the overhead”).

Regarding claim 3, Tople teaches wherein the model parameter data is loaded from at least one model file associated with the at least one ML program (Fig. 3; Step 2, ONNX (Model+Parameters), Enclave Model Binary, Encrypted Parameters).

Regarding claim 8, it is a media/product type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Regarding claim 9, it is a media/product type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above.

Regarding claim 10, it is a media/product type claim having similar limitations as claim 3 above. Therefore, it is rejected under the same rationale above.

Regarding claim 15, it is a system type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above. The additional limitations “a memory device having program code stored thereon; and at least one processor device operatively coupled to a memory device and configured to execute program code stored on the memory device” are taught by Woo in at least Col. 5, lines 62-65: processing unit(s) of controller 108 executes instructions stored in memory to cause controller 108 and circuit 100 to perform one or 
more functions described in this specification

Regarding claim 16, it is a system type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above. 

Regarding claim 17, it is a system type claim having similar limitations as claim 3 above. Therefore, it is rejected under the same rationale above.

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tople and Woo, as applied to claims 1, 8, and 15, in further view of Costa (US 2018/0211035 A1).

Regarding claim 7, Tople nor Woo expressly teach further comprising terminating the at least one enclave.
	However, Costa teaches further comprising terminating the at least one enclave ([0076] Primitives for enclave lifecycle management may include methods for causing the instantiation or termination of an enclave such as enclave 914.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Costa with the teachings of Tople and Woo to allow the user to terminate the enclave. The modification would have been motivated by the desire of allowing the user to perform lifecycle management operations.

Regarding claim 14, it is a media/product type claim having similar limitations as claim 7 above. Therefore, it is rejected under the same rationale above.

Regarding claim 20, it is a system type claim having similar limitations as claim 7 above. Therefore, it is rejected under the same rationale above.

Allowable Subject Matter
Claims 4-6, 11-13, and 18-19  would be allowable if rewritten to overcome the rejections under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Gu et al. (US 2020/0082259 A1) System For Measuring Information Leakage Of Deep Learning Models.
Arora (US 2020/0082279 A1) NEURAL NETWORK INFERENCING ON PROTECTED DATA.
Bitaud et al. (US 2021/0192360 A1) ARTIFICIAL NEURAL NETWORK.
Gu et al. (US 2020/0082270 A1) Verifiable Deep Learning Training Service


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai T An can be reached on (571)-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195