Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending.
Examiner Notes
Examiner cites particular paragraphs and/or columns and lines in the references as applied to Applicant’s claims for the convenience of the Applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. The prompt development of a clear issue requires that the replies of the Applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

USPTO Automated Interview Request (AIR)
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 4, 7, 10, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Schott et al. (US 2021/0256427) (hereinafter Schott as previously cited), Perone et al. (US 2019/0130300) (hereinafter Perone as previously cited), D’Amato et al. (US 2013/0346532) (hereinafter D’Amato as previously cited), Rajkumar et al. (US 2020/0311616) (hereinafter Rajkumar), Wu et al. (US 2017/0295082) (hereinafter Wu as previously cited), andDippenaar et al. (US 2015/0040127) (hereinafter Dippenaar).

As per claim 4, the combination of references above teach a computer-implemented method comprising: 
machine learning models can generate inferences in response to an inference request); 
	determining a first inference service group (ISG) from among a plurality of ISGs to route the request to (D’Amato [0079] determine to route request to one of a plurality of nodes), wherein the ISGs comprise virtual machines instantiated on slots on respective hosts (Dippenaar abstract) in groups of hosts that serve a same set of machine learning models (Rajkumar [0095], [0130], [0140], [0163], [0234] a copy of the same machine learning model can be stored locally on a plurality of robots), wherein the ISGs comprise autoscaling groups that scale a number of hosts based on throughput of the machine learning models they host (Wu [0045] utilize auto-scaler for processing throughput which can dynamically adjust/scale a number of virtual machines based on the throughput using a machine learning model), and wherein the first ISG includes the particular machine learning model (Perone [0043] a machine-learning data structure of a non-volatile memory may store a plurality of machine-learning models and select a particular machine learning model based on a dataset); 
	determining a path to the first ISG (D’Amato [0079] select a best path to route requests); 
	determining a particular host of the first ISG to perform an analysis of the request based on the path to the first ISG (D’Amato [0079] determine which node to route request based on the selected best path), the particular host including the particular machine learning model in memory (Perone [0043]); 
	routing the request to the particular host of the first ISG (D’Amato [0079]); 
	performing inference on the request using the particular host (Schott [0019]; [0023]; [0084] generate an inference and produce results); and 


Perone and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Perone teaches storing a plurality of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott in view of Perone because it would provide a flexible and efficient approach to making operating system decisions based on the best features of machine learning inferences and heuristics. Combining inference values with heuristic values provides the best features of machine learning inferences and heuristics in terms of latency and quality of predictions. The use of machine learning can be used to increase the quality of operations performed by kernel and non-kernel software components; e.g., kernel operations can have higher cache hit rates, better CPU scheduling; non-kernel operations can have more reliable database checkpointing, provide better audio/video content presentation, and better optimize hot loops. As such, machine learning can enhance the performance of both the kernel and non-kernel software components, thereby enhancing the overall performance of the computing device.

D’Amato and Schott are both concerned with virtualized computing environments. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while D’Amato teaches selecting a best path for routing requests to be serviced. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott and Perone in view of D’Amato because it would minimize the cost of establishing a cluster that utilizes shared storage by creating a 

Rajkumar and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Rajkumar teaches copying the same machine learning model to a plurality of different hosts. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, and D’Amato in view of Rajkumar because having the same machine learning model in common among many robots can provide a number of advantages. For example, the computational expense of training the model can be done once, at the server, for each model update, which avoids the need for individual robots to expend power and computing resources for model training. Distributing the same updated model also shares the combined learning of the robot fleet with the robots. All of the robots receive an updated model that provide greater recognition ability, allowing the performance of each robot to improve in a standardized and predictable way.

Wu and Schott are both concerned with virtualized computing environments. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Wu teaches autoscaling, throughput, and machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the 

Dippenaar and Schott are both concerned with virtualized computing environments. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Dippenaar teaches virtual machine slots. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, and Wu in view of Dippenaar because the virtual computer system service may migrate an existing virtual machine instance to a physical host based on customer preferences or requirements, and this may eliminate the practice of creating and terminating virtual machine instances if the physical host used to instantiate the instance does not initially match customer hardware specification requirements. Additionally, since the virtual computer system service may automatically migrate a virtual machine instance to a different physical host based on customer hardware specification requirements, the customer may no longer need to continuously submit requests to allocate an existing virtual machine instance to a physical host with the requested hardware specifications. This, in turn, may reduce 

As per claim 7, Wu teaches the first ISG adding hosts, or additional ISGs being spun up, in response to an increase in traffic among the plurality of ISGs ([0045] utilize auto-scaler for processing throughput which can dynamically spin up virtual machines responsive to the throughput using a machine learning model).

As per claim 10, D’Amato teaches wherein the request and path are routed according to listener rules ([0079] determination of how and where to route requests can be based on various policy considerations e.g. rules, including which connection has greater bandwidth, load balancing, etc.).

As per claim 14, it has similar limitations as claim 4 and is therefore rejected using the same rationale. 

As per claim 17, it has similar limitations as claim 7 and is therefore rejected using the same rationale. 

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Miserendino et al. (US 2016/0260023) (Miserendino as previously cited).

As per claim 5, Miserendino teaches wherein each host maps to at least one virtual node and an identifier of the machine learning model is hashed to at least one of the virtual nodes ([0025] virtual machine clusters are provided for a plurality of machine learning models whereby hashing and mapping are considered equivalent and in the event of one virtual node then the machine learning model could only be hashed/mapped to that one virtual node).

Miserendino and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Miserendino teaches a plurality of virtual machines for a plurality of machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Miserendino because it would improve on existing technologies by moving away from storing training and test sample metadata within the file path name to putting the training and test sample metadata in a searchable, extensible database which would facilitate automating the machine learning and testing processes. It would also move configuration management functions from spreadsheets to an automated service which increases the level of automation, and thus reduces user errors and time in developing and maintaining machine learning solutions.

As per claim 15, it has similar limitations as claim 5 and is therefore rejected using the same rationale. 

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, Miserendino, and Shiga et al. (US 2013/0332608) (hereinafter Shiga as previously cited).

As per claim 6, Shiga teaches wherein which host to route to is randomly determined and a location of the host to route to dictates the first ISG to route the utterance to ([0064], ll. 20-22 the overloaded node may choose any number of other nodes randomly which to offload data to).

Shiga and Schott are both concerned with virtualized computing environments. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Shiga teaches randomly choosing nodes. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Miserendino in view of Shiga because if resource utilization of a node exceeds a preset threshold, then the node is an overloaded node, and the overloaded node migrates out a part of the key-value pairs in the overloaded node in order to reduce the resource utilization to a level below the preset threshold.

As per claim 16, it has similar limitations as claim 6 and is therefore rejected using the same rationale. 

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Shiga.

As per claim 8, Shiga teaches wherein determining a first ISG to route the request to and determining a path to the first ISG further comprises: determining the first ISG is overloaded; generating an indication to create a new ISG to host machine learning models of the first ISG; providing the indication to a monitoring service ([0073] determine an overloaded node, whereby the overloaded node requests creation of a new node to a target node which in turn executes virtual node creation processing and sends a response back to the overloaded node).

Shiga and Schott are both concerned with virtualized computing environments. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Shiga teaches creating a new node in response to another node being overloaded. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Shiga because if resource utilization of a node exceeds a preset threshold, then the node is an overloaded node, and the overloaded node migrates out a part of the key-value pairs in the overloaded node in order to reduce the resource utilization to a level below the preset threshold.

As per claim 18, it has similar limitations as claim 8 and is therefore rejected using the same rationale. 

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, Crawford et al. (US 2017/0255248) (hereinafter Crawford as previously cited), and Attenberg et al. (US 2013/0282630) (hereinafter Attenberg as previously cited).

As per claim 9, the combination of references above teach wherein determining the first ISG is overloaded comprises: calculating a first metric of a cache miss rate of the hosts of the first ISG over a time period (Crawford fig. 4, block 68); and calculating a second metric of a number of unique models requested in the first ISG over the time period (Attenberg [0037] determine a number of user actions requested which in turn initiates a number of training sessions each using a different machine learning model), wherein when either the first or second metric exceeds a threshold the first ISG is overloaded (Crawford fig. 4, block 80 in the event the cache miss rate exceeds the threshold for longer than a preset period then power up an additional bank to handle apparent overload).

Crawford and Schott are both concerned with computer caching. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Crawford teaches determining whether a cache miss rate has exceeded a threshold for a period of time. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Crawford because the use of a cache to store temporary copies of data items retrieved from memory allows the latency associated with retrieving data items from memory to be reduced and furthermore to reduce the energy expenditure associated with retrieval of those data items. Hence the benefit of the reduced leakage power resulting from powering down a portion of the cache can be gained.

Attenberg and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Attenberg teaches determining a number of different machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar and Crawford in view of Attenberg because it would employ machine learning to improve processing of individual tasks based on comparison of human processing results. Once performance of a particular task by machine processing reaches a threshold, the level of human processing used on that task is reduced.

As per claim 19, it has similar limitations as claim 9 and is therefore rejected using the same rationale. 

Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Goli et al. (US 10,558,579) (hereinafter Goli as previously cited).

As per claim 11, Goli teaches wherein each host of the first ISG caches a first plurality of machine learning models loaded in random access memory and caches a second, different plurality of machine learning models according to a least frequently used caching model in disk (col. 4, ll. 27-46 and col. 9, ll. 56-59 tuning machine learning models to improve cache hit rates utilizing an LFU algorithm and the cache may be implemented in RAM).

Goli and Schott are both concerned with computer caching. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Goli teaches utilizing an LFU algorithm and tuning machine learning models to improve cache hit rates wherein the cache can be implemented in RAM. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Goli because it would provide for a system that can adapt itself to changing user access patterns and give good hit rates for both LRU and LFU friendly workloads, resulting in consistently high hit rates which can be optimized for workloads that favor special locality or certain data types to improve the hit rate without adding considerable latency overhead.

As per claim 20, it has similar limitations as claim 11 and is therefore rejected using the same rationale. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Feast et al. (US 2018/0262539) (hereinafter Feast as previously cited).

As per claim 12, Feast teaches storing data including the request data and inference result in a data hub accessible to a subscribing entity ([0101] store probe data and results for inference engine to make inferences that subscribers may use).

Feast and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Feast teaches storing inference data for a subscriber. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Feast because it would provide a way to save time, money, and effort of both communicating parties by reducing or eliminating unnecessary attempts to connect where there is an insufficient likelihood of interaction success between the two parties.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Bellis et al. (US 2018/0375998) (hereinafter Bellis as previously cited).

As per claim 13, Bellis teaches wherein the request is received from a bot ([0073] server receives request from a chat bot).

Bellis and Schott are both concerned with a virtualized computing environment. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Bellis teaches receiving a request from a bot. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Bellis because collected information about a customer and/or the customer's historical information may be .

Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Emery et al. (US 2018/0181558) (hereinafter Emery as previously cited).

As per claim 1, it has similar limitations as claim 4 and is therefore rejected using the same rationale as claim 4. Claim 1 includes an additional recitation of utterance and bot. However, Emery teaches utterance and bot (fig. 4, 6, and 8 and [0031]-[0033]). 

Emery and Schott are both concerned with machine learning. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Emery teaches the interaction of a user with bots using utterances. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, and Dippenaar in view of Emery because it would provide an improved solution for facilitating conversations between users and conversational bot thereby improving a user's dialog experience with a conversational bot. The conversational bot routing engine may store and train a bot recommendation model for dynamically selecting a bot that can best answer a given query from a user to provide a better reply for the query. Thus, the dynamic selection of a proper bot for the user would improve user experience.

As per claim 2, Wu teaches the first ISG adding hosts, or additional ISGs being spun up, in response to an increase in traffic among the plurality of ISGs ([0045] utilize auto-scaler for processing throughput which can dynamically spin up virtual machines responsive to the throughput using a machine learning model).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, Emery, and Goli.

As per claim 3, Goli teaches wherein each host of the first ISG caches a first plurality of machine learning models loaded in random access memory and caches a second, different plurality of machine learning models according to a least frequently used caching model in disk (col. 4, ll. 27-46 and col. 9, ll. 56-59 tuning machine learning models to improve cache hit rates utilizing an LFU algorithm and the cache may be implemented in RAM).

Goli and Schott are both concerned with computer caching. Schott teaches receiving inference requests and fulfilling the requests using machine learning to produce results while Goli teaches utilizing an LFU algorithm and tuning machine learning models to improve cache hit rates wherein the cache can be implemented in RAM. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Schott, Perone, D’Amato, Rajkumar, Wu, Dippenaar, and Emery in view of Goli because it would provide for a system that can adapt itself to changing user access patterns and give good hit rates for both LRU and LFU friendly workloads, resulting in consistently high hit rates which .

Response to Arguments
Applicant's arguments have been considered but are moot in view of the new ground(s) of rejection necessitated by Applicant’s amendments. 

Relevant Art Not Cited
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure:

Zhang et al. (US 2019/0325305) disclose mapping a machine learning model to a multi-core inference accelerator engine.

Mimura et al. (US 20150222515) in at least abstract disclose scaling virtual machines based on traffic load.

Kasturi et al. (US 2015/0363219) in at least [0217] and [0231] disclose scaling virtual machines based on data traffic.

Jain et al. (US 2016/0036838) in at least [0028] and [0090] scaling out or instantiating virtual machines as traffic volume in the system increases.

Cucinotta et al. (US 2016/0210166) in at least [0020] disclose a scaling-out process of adding/instantiating virtual machines when a load of a VM exceeds a threshold in order to take a portion of the load from the overloaded VM.

Elyashev et al. (US 2010/0332657) in at least [0032]  disclose generating a request to add a new virtual machine when a host is overloaded based on a utilization being higher than a predetermined threshold for more than a predetermined time period.

Edsall et al. (US 2004/0139167) in at least [0056] disclose adding a node of a same type if one node becomes over-loaded.

Chen et al. (US 5,553,235) in at least col. 83, ll. 63-65 disclose adding a faster process if the CPU is overloaded, and adding more memory if the system is thrashing.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Lee whose telephone number is (571)270-3369.  The examiner can normally be reached on M-TH 8AM-5PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 5712723721.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




	
/Adam Lee/Primary Examiner, Art Unit 2193                                                                                                                                                                                            March 15, 2022