DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8-11 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Publication No. 2020/0267053 A1 to Zheng et al. (“Zheng”) in view of U.S. Patent Publication No. 2020/0219007 A1 to Byers et al. (“Byers”).  
As to claim 1, Zheng discloses a method, by a processor, for load balancing of machine learning operations in a computing environment (Zheng: fig 1-16, [0096-102; 138-145]: fig 7 … model 702 determines network parameters such as latencies and performance metrics … including cloud computing portion 704, load balancing portion 706, edge computing portion 708, user device portion 710 … various input/output parameters 712 714 [0096] … for example, data download rate can reflect a speed which user device e.g. mobile phone can download images and/or video from server on core data center or an edge data center [0100] …).
Zheng did not explicitly disclose dynamically balancing one or more machine learning operations between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric (emphasis added). 
Specifically, Zheng discloses balancing one or more machine learning operations between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric (Zheng: fig 1-16, [0047; 96-104; 138-145]: fig 4 & 6-7 … model 702 determines network parameters such as latencies and performance metrics … including cloud computing portion 704, load balancing portion (balancing) 706, edge computing portion 708, user device portion 710 … various input/output parameters 712 714 [0096-97] … communication component enables communication between system components, models, devices etc over one or more networks e.g. over cloud environment including core and edge data centers … such networks include wired and wireless networks (… between one or more edge computing devices in a wireless communication network and a cloud computing system …) [0087, 90] … management computing entity 410 includes machine learning component 412 … diagram 400 includes inference component 403 and training component 401 … used to determine expected distribution (balancing) of processing given workload across core data center and one or more edge data centers (… between one or more edge computing devices and a cloud computing system …) … can be retrained with different or similar network parameters  [0073-74]  … fig 1-3 & 5- 6  … cloud processing on core data center and/or fog computing can refer to extended computing at edge [0047;104]… routing component 612 determines to transmit workloads and/or association data to different portions of network (e.g. wireless portion wired portion) (balancing) … server(s) (device(s)) 614 represents server on network, core data center  and/or edge data center  (see with [0096-97; 87; 90; 73-74; 104; 47]- redeploy/load balancing edge portions/devices in wireless and cloud portions) [0094-95] … management computing entity 510 can determine the expected distribution 526 of given workload live (such as in real-time or dynamically) [0080]  … fig 15 [141-143]).   
Nonetheless, Zheng did not explicitly disclose dynamically balancing one or more machine learning operations between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric (emphasis added).
Byers discloses dynamically balancing one or more machine learning operations between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric (emphasis added) (Byers: fig 1-4, [0003-44]:  fig 1-2 … ML orchestration which considers performance impacts of both ML learning and ML inference phases concurrently (machine learning operations) and dynamically adjusts depth of ML functions in network levels (cloud/fog/edge) (between cloud and edge/fog) and redeploys (dynamically load balances) ML function(s) on various resources to efficiently meet system performance requirements (see with [0037] - for increasing performance of a selected metric) [0017] … fig 3-4 … for example, at block 410 ML orchestration module (MLOM) determines a first network level change of ML learning function to provide increased performance (see with [0017] - dynamically balancing one or more machine learning operations …) … determines to adjust network level of ML learning functions 205 from cloud network level 102 to lower level such as network edge level 104 closer to data source (see with [0037] … between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric) [0039] … the MLOM 350 is hosted in the cloud network level 102 … in some example embodiments, MLOM (itself) is latency, bandwidth or reliability critical and it can be moved from cloud network level 102 and/or distributed to lower levels of the edge/fog hierarchy (the management MLOM (ML orchestration module) is load balanced between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric) [0026] … different performance requirements on the ML learning or training phase and the ML inference or implementation phase … ML learning phase are typically processor intensive but less real-time intensive such that can be performed in cloud and ML inference phase reacts very quickly to abnormal conditions requiring less processing resources and executes on nodes lower in distributed network levels such as fog (or edge) [0014] … ).
Zheng and Byers are analogous art because they are from the same field of endeavor with respect to ML (machine learning).
Before the effective filing date, for AIA , it would have been obvious to a person of ordinary skill in the art to incorporate strategies by Byers into the method by Zheng.  The suggestion/motivation would have been to enable management computing entity 510 can determine the expected distribution 526 of given workload live (such as in real-time or dynamically) (Zheng: [0080]) and  to provide ML orchestration which considers performance impacts of both ML learning and ML inference phases concurrently and dynamically adjusts depth of ML functions in network levels (cloud/fog/edge) and redeploys (i.e. load balances) ML function(s) on various resources (Byers: [0017]) and enable application to learn and adapt to runtime conditions and adjust to changing network and application conditions in real-time to efficiently respond to network conditions as they occur (Byers: [0012]) and provide for the MLOM (itself) is latency, bandwidth or reliability critical and it can be moved from cloud network level 102 and/or distributed to lower levels of the edge/fog hierarchy (the management MLOM (ML orchestration module) is load balanced between one or more edge computing devices in a wireless communication network and a cloud computing system for increasing performance of a selected metric) (Byers: [0026]).
As to claim 2, Zheng and Byers disclose selecting as the selected metric a power metric, temperature metric, a performance metric, data throughput metric, or a combination thereof  (Zheng: fig 1-16, [0047; 96-104; 138-145]: fig 7 … host-level performance metric can be storage metric … can include CPU utilization … machine learning performance  metric that can include efficiencies, power usage and/or delays associated with routing data to various portions of architecture [0102] … labels 406 can represent optimal distribution of given workload across core data center and one or more  edge data centers … feature vectors 508 can represent parameters of interest e.g. latencies and/or data transmission rates can be extracted from raw data that may be part of network parameters 504 … can represent individual measurable properties or characteristics of transmissions observed over the network architecture, for example, in connection with fig 7 [0071-72; 78]).
For motivation, see rejection of claim 1.
As to claim 3, Zheng and Byers disclose performing an inference operation or training operation by the one or more machine learning operations (Byers: fig 1-4, [0003-44]: different performance requirements on the ML learning or training phase (training operation) and the ML inference or implementation phase (performing an inference operation) … ML learning phase are typically processor intensive but less real-time intensive such that can be performed in cloud and ML inference phase reacts very quickly to abnormal conditions requiring less processing resources and executes on nodes lower in distributed network levels such as fog (or edge) [0014]).
For motivation, see rejection of claim 1.
As to claim 4, Zheng and Byers disclose determining whether the one or more machine learning operations are executing on the one or more edge computing devices, the cloud computing system, or a combination thereof according to a variable (Byers: fig 1-4, [0003-44]: MLOM 350 can reallocate resource allocations for various ML functions (ML operations) ML learning function … ML inference function … determines reallocation to provide increased performance of ML system … for example, if ML inference function are executing below performance parameters (according to variable(s)) on node and/or network level … reallocation may include reallocation of resources across several nodes and/or network levels (see with [0014; 17] - determining whether the one or more machine learning operations are executing on the one or more edge computing devices, the cloud computing system, or a combination thereof) [0043]).
For motivation, see rejection of claim 1.
As to claims 8-11, see similar rejection to claims 1-4, respectively, where the system is taught by the method.
As to claims 15-18, see similar rejection to claims 1-4, respectively, where the product is taught by the method
Claims 5-7, 12-14 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Publication No. 2020/0267053 A1 to Zheng et al. (“Zheng”) in view of U.S. Patent Publication No. 2020/0219007 A1 to Byers et al. (“Byers”) and further in view of U.S. Patent Publication No. 2019/0138908 A1 to Bernat et al. (“Bernat”).
As to claim 5, Zheng and Byers disclose the method of claim .
For motivation, see rejection of claim 1.
Zheng did not explicitly disclose setting the variable as the edge computing device, the cloud computing system, or a combination thereof for indicating one or more current platforms performing an inference operation.
Bernat discloses setting the variable as the edge computing device, the cloud computing system, or a combination thereof for indicating one or more current platforms performing an inference operation (Bernat: fig 1-15, [0030-66]: fig 1 & 4-7 … a request for AI inferencing operations for execution or performance with an AI model, such as from an edge device e.g. endpoint, UE, client device etc and operations proceed with identifying relevant data values (setting the variable as …) e.g. identifier selection of SLA, etc from the inferencing request … to perform an inference or AI operation (performing an inference or AI operation) … information request used to obtain binary of relevant AI model for execution on specific hardware platform (… for indicating one or more current platforms performing an inference operation) [0054] … fig 8-10 … depending on real-time requirements a hierarchical structure of data processing and storage nodes are defined (setting the variable(s) as …) including local ultra-low-latency processing, regional storage (edge computing device(s))  as well as remote cloud data-center based storage and processing (cloud computing system(s)) and SLAs (service level agreement) and KPIs (key performance indicators) (according to one or more rules, conditions, or metrics) may be used to identify where data is best transferred (see with [0054] - setting the variable as the edge computing device, the cloud computing system, or a combination thereof for indicating one or more current platforms performing an inference operation and/or according to one or more rules, conditions, or metrics) [0062]  fig 1 & 4-7 [0030; 37; 61-65]).
Zheng, Byers and Bernat are analogous art because they are from the same field of endeavor with respect to ML (machine learning).
Before the effective filing date, for AIA , it would have been obvious to a person of ordinary skill in the art to incorporate strategies by Bernat into the method by Zheng.  The suggestion/motivation would have been to enable AI as a Service (AIaaS) for both edge computing and wide area network deployment with selection of appropriate processing and network resources and distribution of processing operations towards edge devices and reduction of unnecessary or improper resource usage (Bernat: [0027]).
As to claim 6, see similar rejection to claim 5 where the method is taught by the method.
As to claim 7, see similar rejection to claims 5-6.
As to claim 7, Zheng, Byers and Bernat further disclose simultaneously perform one or more similar or different inference operations on both the one or more edge computing devices and the cloud computing system  (Zheng: fig 1-16, [0045-62; 96-104; 138-145]: fig 1-2 & 4-6 … computing component 610 may use AI e.g. machine learning components in fig 4-5 (fig 4 inference component 403 and training component 401 and machine learning component 410) to determine routing of workload between portions of network architecture (… perform one or more similar or different inference operations …) [0091] server 614 executes at least portions of management computing entity 601 and can represent server on network, a core data center and/or edge data center (see with [0091] - perform one or more similar or different inference operations on both the one or more edge computing devices and the cloud computing system) [0095]  … as noted MEC can serve to distribute cloud capabilities to the edge of networks where the cloud capabilities to the edge of networks see with [0091;95] -  on both the one or more edge computing devices and the cloud computing system) where cloud capabilities can be relatively closer to local mobile users [0054] … retrieval, loading and/or execution may be performed in parallel (see with [0091;95;54] - simultaneously perform one or more similar or different inference operations …) such that multiple instructions are retrieved, loaded and/or executed together [0055]).
For motivation, see rejection of claim 5.
As to claims 12-14, see similar rejection to claims 5-7, respectively, where the system is taught by the method.
As to claim 19, see similar rejection to claims 5-6, where the product is taught by the method.
As to claim 20, see similar rejection to claim 7.
Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:
A] US 20210312316 – Lee
The similarity-based hierarchical data loading method involves receiving original data for machine learning training. The original data is divided into baseline data and difference data. The baseline data and the difference data are stored in different memory devices. The baseline data is stored in a first memory device having faster access speed than a second memory device in which the difference data is stored. The baseline data and the difference data are loaded from the different memory devices. The original data are reconstructed from the baseline data and the difference data. The reconstructed original data are fed to a machine learning model to train the machine learning model.
B] US 20220237476 – Benjamin
Systems, methods, and other embodiments associated with a machine learning predictive model for predicting a propensity to implement energy reduction settings are described. Data records including load data for a target group of dwellings is obtained. An empirical load shape is generated for each given target dwelling based on the load data. A target feature vector is generated for each given target dwelling based on at least the empirical load shape corresponding to the given target dwelling. A trained machine learning predictive model is executed on the target feature vectors of the target group of dwellings to identify a set of target dwellings that are likely to reduce electricity consumed in accordance with electricity settings based on at least a generated predicted propensity for a target dwelling to implement the electricity settings.
C] US 20210357268 - O'Donoghue
There is a need for more effective and efficient constrained-optimization-based operational load balancing. In one example, a method comprises determining constraint-satisfying operator-unit mapping arrangements that satisfy an operator unity constraint and an operator capacity constraint; for each constraint-satisfying operator-unit mapping arrangement, determining an arrangement utility measure; processing each arrangement utility measure using an optimization-based ensemble machine learning model that is configured to determine an optimal operator-unit mapping arrangement of the plurality of constraint-satisfying operator-unit mapping arrangements; and performing one or more operational load balancing operations based on the optimal operator-unit mapping arrangement.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUNE SISON whose telephone number is (571)270-5693. The examiner can normally be reached 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emmanuel Moise can be reached on 571-272-3865. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JUNE SISON/Primary Examiner, Art Unit 2455