DETAILED ACTION
Claims 1-20 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7, 8, 9, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 10,990,850, hereinafter Chen), in view of Kasaragod et al. (US 2019/0036716, hereinafter Kasaragod).

As per claim 1, Chen teaches a distributed inference system comprising: 
an end device (i.e., edge device, see at least Fig. 1) configured to: 
generate status information corresponding to the end device (i.e., metadata includes an identifier of the electronic device, a timestamp indicating when the sample was generated by the electronic device or transmitted by the electronic device, see at least column 11, lines 1-8), 
obtain target data (i.e., obtaining a plurality of samples generated by an electronic device, see at least column 10, lines 43-48), 
perform a first inference of the target data based on a first machine learning model and generate an inference result corresponding to the target data (i.e., first plurality of inference values generated by a first machine learning (ML) model executed by the electronic device using the plurality of samples, see at least column 10, lines 43-48), and 
transmit the status information and the inference result (i.e., inference values together with other associated metadata are sent to the provider network, see at least column 4, lines 40-44, column 10, lines 49-57, column 11, lines 1-8); and 
a server (see at least Fig. 1, Fig. 6, column 24, lines 34-45) configured to: 
receive the status information and the inference result (i.e., obtaining metadata for the samples, metadata include inference values, an identifier of the electronic device, a timestamp, see at least column 9, lines 32-37, column 11, lines 1-8), 
create a second machine learning model (i.e., retraining the first ML model to yield a retrained model, see at least column 11, lines 29-32), 
generate accuracy information corresponding to an accuracy of the inference result (i.e., evaluating the performance of the deployed model based on the ground truth values and the corresponding inferences from the deployed model, see at least column 9, lines 62-65), and 
transmit the second machine learning model to the end device based on the accuracy information (i.e., if performance is not adequate, retrain the deployed model, cause the retrained model to be deployed to the edge device, see at least column 10, lines 5-11, column 11, lines 40-41).
	Chen does not explicitly teach a second machine model is created based on the status information and a training dataset comprising the inference result.
Kasaragod teaches create a second machine learning model based on the status information and a training dataset comprising the inference result (i.e., generate an update for the remote data processing model of the computing device based on the remote result, generate updates to one or more respective local models at the respective edge devices based on a state of the one or more respective edge devices, see at least [0082], [0087], [0111]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that a second machine model is created based on the status information and a training dataset comprising the inference result as similarly taught by Kasaragod because it would have been obvious to train a machine model using known inputs in the art such as based on states of edge devices and inference results as taught by Kasaragod (see at least [0082], [0087], [0111] of Kasaragod).

As per claim 7, Chen teaches wherein the end device comprises a first inference engine configured to perform the first inference of the target data and generate an inference request or hit data based on the inference result (i.e., machine learning model can be used to generate inferences typically using a sample, application may send samples during moments of low-probability inferences, see at least column 4, line 20 – column 6, line 61), and 
wherein the server comprises a second inference engine configured to perform a second inference of the target data based on the inference request and generate the accuracy information based on the inference result (i.e., teacher model may be run, with the inference values from the teacher model, compare how well deployed model performs compared to the teacher model, see at least column 6, line 45 - column 7, line 11).

As per claim 8, Chen teaches wherein the server comprises: an inference engine configured to: 
perform an inference of the target data in response to an inference request from the end device (i.e., teacher model may be run, see at least column 4, line 20 – column 6, line 61) and 
calculate the accuracy information based on the inference result (i.e., compare how well deployed model performs compared to the teacher model, see at least column 6, line 45 - column 7, line 11); 
a device manager configured to generate a command for creating the second machine learning model, based on the accuracy information (i.e., when the performance of the deployed ML model is not satisfactory, the model adaptation controller can retain the currently-deployed ML model, issue a request to the machine learning service to perform a training job, which returns a trained model, see at least column 7, lines 40-54); and 
a training module configured to create the second machine learning model based on the command (i.e., issue a request to the machine learning service to perform a training job, which returns a trained model, see at least column 7, lines 40-54).

As per claim 9, Chen teaches wherein the training dataset further comprises the target data and the accuracy information (i.e., evaluating performance of the deployed model based on ground truth values and the corresponding inferences, the evaluation may be based on determining a metric, see at least column 9, line 25 – column 10, line 23), and 
wherein the training module selects at least a portion of the training dataset based on the accuracy information and creates the second machine learning model based on the selected portion of the training dataset (i.e., retrain the currently deployed ML model using a number of samples, see at least column 7, lines 40-56, column 9, line 25 – column 10, line 14). 
Chen does not explicitly teach second machine learning is created based on the status information.
Kasaragod teaches create a second machine learning model based on the status information (i.e., generate updates to one or more respective local models at the respective edge devices based on a state of the one or more respective edge devices, see at least [0082], [0087], [0111]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that a second machine model is created based on the status information as similarly taught by Kasaragod to because it would have been obvious to train a machine model using known inputs in the art such as based on states of edge devices as taught by Kasaragod (see at least [0082], [0087], [0111] of Kasaragod).

As per claim 13, the limitations recited in this claim are similarly recited in claims 1.  Therefore, claim 13 is rejected using the same reasons as claim 1. 

Claims 2-6 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Sridhara et al. (US 2014/0237595, hereinafter Sridhara), further in view of Guim Bernat et al. (US 2019/0044831, hereinafter Guim), further in view of Poorchandran et al. (US 2019/0044882, hereinafter Poorchandran).

As per claim 2, Chen does not explicitly teach wherein the server comprises: an agent manager configured to register the end device based on the status information; and a device manager configured to: generate a grade of the end device based on the status information, and calculate a priority of the end device based on the grade.
Sridhara teaches an agent manager configured to register the end device based on the status information (i.e., the server processor may associate the device-specific lean classifier model with the capability and state identified in the information received from the requesting mobile computing device, see at least [0153], [0154]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that the server comprises: an agent manager configured to register the end device based on the status information as similarly taught by Sridhara to organize models in a database and to perform database lookup operations to obtain stored models matching capabilities and/or states/configurations present on a requesting mobile computing device (see at least [0153], [0154] of Sridhara).
Guim taches a device manager configured to: calculate a priority of the end device (i.e., priority may be determined by the additional processing logic in the gateway, see at least [0065]).
Poorchandran teaches a device manager configured to: generate a grade of the end device based on the status information (i.e., identify a context for each of the compute devices, see at least [0045], [0046]), and calculate a priority of the end device based on the grade (i.e., determine a priority for each context identifier, see at least [0047]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such to include a device manager configured to: generate a grade of the end device based on the status information, and calculate a priority of the end device based on the grade as similarly taught by Guim and Poorchandran in order to prioritize inference tasks based on priority (see at least [0052], [0064] of Guim) and priority is based on context of the device (see at least [0052], [0057] of Guim, [0045], [0046], [0047] of Poorchandran).

As per claim 3, Chen does not explicitly teach wherein the agent manager is further configured to generate manual information for controlling an operation of the end device based on the grade.
Poorchandran teaches an agent manager is further configured to generate manual information for controlling an operation of the end device based on the grade (i.e., bandwidth moderator is configured to determine how to divide the total available bandwidth across connected compute devices based on context priority, to define a prioritization schedule, see at least [0045], [0046], [0050], [0051]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that the agent manager is further configured to generate manual information for controlling an operation of the end device based on the grade as similarly taught by Poorchandran in order to provide scheduling for a plurality of compute devices sharing a resource.

As per claim 4, Chen teaches wherein the server further comprises an inference engine configured to perform a second inference of the target data in response to an inference request from the end device (i.e., issue inference request for each of the samples to be labeled, see at least column 6, line 45 – column 8, line 19).
Chen does not explicitly teach wherein the inference request is scheduled based on the priority of the end device.
Guim taches inference request is scheduled based on the priority of the end device (i.e., the handling of a lower-priority request can be dropped or delayed on the platform by the service logic in order to timely service the high priority request, see at least [0057], [0058], [0064], [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that inference request is scheduled based on the priority of the end device as similarly taught by Guim in order to prioritize inference tasks based on priority (see at least [0052], [0064] of Guim).

As per claim 5, Chen does not explicitly teach wherein the end device is further configured to set a priority of the inference request and provide the inference request to the inference engine based on the priority of the inference request, and wherein, when the priority of the inference request and the priority of the end device are different, the inference engine schedules the inference request based on the priority of the inference request.
Guim teaches wherein the end device is further configured to set a priority of the inference request and provide the inference request to the inference engine based on the priority of the inference request (i.e., requester determine priority of request, see at least [0057], [0064]).
Poorchandran teaches wherein, when the priority of the request and the priority of the end device are different, the engine schedules the request based on the priority of the request (i.e., determine a priority for each context identified for each compute device, which may be based on an individual priority associated with each activity, device, etc., that can be influence by one or more applicable weights relative to the context, see at least [0047]-[0049]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that the end device is further configured to set a priority of the inference request and provide the inference request to the inference engine based on the priority of the inference request, and wherein, when the priority of the inference request and the priority of the end device are different, the inference engine schedules the inference request based on the priority of the inference request as similarly taught by the combination of Guim and Poorchandran in order to determine priorities based on individual priorities (see at least ([0047]-[0049] of  Poorchandran).

As per claim 6, Chen does not explicitly teach wherein, when a priority of another end device is higher than a priority corresponding to the inference request, the inference engine stops performing the second inference of the target data based on status information of the other end device.
Guim teaches when a priority of another end device is higher than a priority corresponding to the inference request, the inference engine stops performing the second inference of the target data based on status information of the other end device (i.e., the handling of a lower-priority request can be dropped or delayed on the platform by the service logic in order to timely service the high priority request, see at least [0064]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that when a priority of another end device is higher than a priority corresponding to the inference request, the inference engine stops performing the second inference of the target data based on status information of the other end device as similarly taught by Guim in order to timely service high priority requests (see at least [0052], [0064] of Guim).

Claims 10 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Gurumoorthy et al. (US 2008/0154805, hereinafter Gurumoorthy).

As per claim 10, Chen does not explicitly teach wherein the server is further configured to provide size information of the second machine learning model to the end device, and wherein the end device is further configured to decide a time to receive the second machine learning model based on the size information.
Gurumoorthy teaches wherein the server is further configured to provide size information of a file to the end device (i.e., size of the file supplied by the management console, see at least Fig. 2, [0042]), and 
wherein the end device is further configured to decide a time to receive the file based on the size information (i.e., the file may be scheduled to be downloaded based on the size of the file, see at least [0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such the server is further configured to provide size information of the second machine learning model to the end device, and wherein the end device is further configured to decide a time to receive the second machine learning model based on the size information as similarly taught by Gurumoorthy for the download of a file in order for the download to occur at a time that is least intrusive to the end user (see at least [0035] of Gurumoorthy).

As per claim 15, Chen teaches wherein the transmitting the second machine-learning model comprises: renewing of the first machine learning model with the second machine learning model (i.e., download the retrained model, see at least column 7, line 63 – column 8, line 10).
Chen does not explicitly teach providing size information of the second machine learning model to the end device; and determining a time to receive the second machine learning model based on the size information.
Gurumoorthy teaches providing size information of a file to the end device (i.e., size of the file supplied by the management console, see at least Fig. 2, [0042]), and 
determining a time to receive the file based on the size information (i.e., the file may be scheduled to be downloaded based on the size of the file, see at least [0035]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen to provide size information of the second machine learning model to the end device; and determine a time to receive the second machine learning model based on the size information as similarly taught by Gurumoorthy for the download of a file in order for the download to occur at a time that is least intrusive to the end user (see at least [0035] of Gurumoorthy).


Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Gurumoorthy, further in view of Dang et al. (US 5,787,446, hereinafter Dang).

As per claim 11, Chen teaches wherein the end device is further configured to receive the second machine learning model after issuing an inference request to the server (i.e., issue inference request for each of the samples to be labeled, when the performance of the deployed ML model is not satisfactory, cause the retrained model to be deployed to the edge device, see at least column 6, line 45 – column 8, line 19).
Chen does not explicitly teach when a size of the second machine learning model is greater than a free space of the end device, and erasing the first machine learning model.
Dang teaches when a size of file is greater than a free space of an end device, and erasing a previous version of the file (i.e., if the identified piece of media lacks sufficient storage space, first deletes the old version of the file, see at least Fig. 3A, column 5, lines 47-55).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that when a size of the second machine learning model is greater than a free space of the end device, erasing the first machine learning model as similarly taught by Dang for a file because the edge devices could have resource constrains (see at least column 2, lines 54-59 of Chen), and thus it would have been obvious to check whether sufficient storage is available and to free storage space when the storage is not needed.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Gurumoorthy, further in view of Anand et al. (US 2008/0256314, hereinafter Anand).

As per claim 12, Chen does not explicitly teach wherein, when a size of the second machine learning model is smaller than a free space of the end device, the end device is further configured to erase the first machine learning model after receiving the second machine learning model.
Anand teaches wherein, when a size of the collection of data  is smaller than a free space of the end device, the end device is further configured to erase a previous version of collection of data after receiving the collection of data (i.e., if the estimated amount of storage space is less than the identified available amount, the previous version of the collection of data is deleted, see at least Fig. 4, [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that when a size of the second machine learning model is smaller than a free space of the end device, the end device is further configured to erase the first machine learning model after receiving the second machine learning model as similarly taught by Anand for a collection of data because the edge devices could have resource constrains (see at least column 2, lines 54-59 of Chen), and thus it would have been obvious to check whether sufficient storage is available and to free storage space that is not needed. 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Sridhara.

As per claim 14, Chen teaches wherein the determining whether to create the
second machine learning model comprises:
calculating an accuracy of the inference result (i.e., the performance (accuracy) of the deployed ML model is satisfactory, see at least column 7, lines 5-35);
determining whether the accuracy is within a reference range (i.e., whether the performance (accuracy) of the deployed ML model is satisfactory – e.g., the same as the teacher model within a threshold amount of the teacher model, see at least column 7, lines 5-12).
Chen does not explicitly teach determining whether a previously created machine learning model is present in a caching module of the server.
Sridhara teaches determining whether a previously created machine learning model is present in a caching module of the server (i.e., the server processor may send a stored/cached lean classifier model to a mobile computing device requesting a classifier model in response to determining that the mobile computing device has a set of features that matches the features represented in the stored model, see at least [0007]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen to determine whether a previously created machine learning model is present in a caching module of the server as similarly taught by Sridhara to allow matching of stored models to mobile computing devices (see at least [0007] of Sridhara).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Chen, in view of Kasaragod, further in view of Gurumoorthy, further in view of Dang, further in view of Anand.

As per claim 16, Chen teaches the end device receives the second machine learning model after issuing an inference request to the server (i.e., issue inference request for each of the samples to be labeled, when the performance of the deployed ML model is not satisfactory, cause the retrained model to be deployed to the edge device, see at least column 6, line 45 – column 8, line 19).
Chen does not explicitly teach wherein, when a size of the second machine learning model indicated by the size information is greater than a free space of the end device, and erasing the first machine learning model, and wherein, when the size of the second machine learning model is smaller than the free space of the end device, the end device erases the first machine learning model after receiving the second machine learning model. (Examiner notes that the claim limitations are contingent limitations. The broadest reasonable interpretation of a method claim having contingent limitations does not include steps that are not performed because the condition precedent are not met.  MPEP 2111.04(II).  Thus, the broadest reasonable interpretation of method claim 16 does not require the limitations of claim 16.  However, Examiner is addressing this limitation for the compact prosecution of claims.)
Dang teaches when a size of file indicated by size information is greater than a free space of an end device, and erasing a previous version of the file (i.e., if the identified piece of media lacks sufficient storage space, first deletes the old version of the file, see at least Fig. 3A, column 5, lines 47-55).
	Anand teaches wherein, when a size of the collection of data  is smaller than a free space of the end device, the end device is further configured to erase a previous version of collection of data after receiving the collection of data (i.e., if the estimated amount of storage space is less than the identified available amount, the previous version of the collection of data is deleted, see at least Fig. 4, [0065]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Chen such that when a size of the second machine learning model indicated by the size information is greater than a free space of the end device, and erasing the first machine learning model, and wherein, when the size of the second machine learning model is smaller than the free space of the end device, the end device erases the first machine learning model after receiving the second machine learning model as similarly taught by Dang and  Anand because the edge devices could have resource constrains (see at least column 2, lines 54-59 of Chen), and thus it would have been obvious to check whether sufficient storage is available and to free storage space that is not needed. 

Claims 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kasaragod, in view of Guim.

As per claim 17, Kasaragod teaches an operating method of a distributed inference system, the method comprising:
providing status information corresponding to an end device to a server (i.e., model training service of the provider network and/or the model trainer may obtain one or more indications of the state of the one or more edge devices, see at least [0130]);
performing, at the end device, a first inference of target data obtained by the end device (i.e., an edge device may generate a prediction based on processing of the data using the model of the edge device, see at least [0151]);
providing, by the end device, a first inference request to the server based on an inference result of the target data (i.e., the tier manager may determine whether a confidence level of the prediction is below a threshold confidence level, if so, then the tier manager may send the data to a tier device for processing by a model of the tier device, see at least [0151]).
Kasaragod does not explicitly teach generating, at the server, a priority corresponding to the end device based on the status information; scheduling, at the server, the first inference request based on the priority corresponding to the end device.
Guim taches generating, at the server, a priority corresponding to the end device based on status information  (i.e., priority may be determined by the additional processing logic in the gateway, for example, information about the detected person is sent in a request to the gateway for an object-in-vehicle-path inference model, car stopped at the intersection for the red light, this is determined to be a low-priority, high latency request, car is moving at the speed limit towards the intersection, this is determined to be a high-priority low-latency request, see at least [0052], [0057], [0058], [0060], [0065]); 
scheduling, at the server, the first inference request based on the priority corresponding to the end device (i.e., the handling of a lower-priority request can be dropped or delayed on the platform by the service logic in order to timely service the high priority request, see at least [0057], [0058], [0064], [0065]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kasaragod such to generate, at the server, a priority corresponding to the end device based on the status information and to schedule, at the server, the first inference request based on the priority corresponding to the end device as similarly taught by Guim and Poorchandran in order to prioritize inference tasks based on priority and priority is based on context of the device (see at least [0052], [0057], [0058], [0064], [0065] of Guim).

As per claim 20, Kasaragod performing, at the server, an inference of the target data in response to the first inference request (i.e., send the data to a tier device for processing by a model of the tier device, see at least [0151]) .
Kasaragod does not explicitly teach determining whether a priority of a second end device with a second inference request is higher than the priority of the end device; and when the second inference request with the higher priority exists, stopping the inference of the target data in response to the first inference request based on status information of the second end device. (Examiner notes that the claim limitation “when the second inference request . . . the second device” is a contingent limitation. The broadest reasonable interpretation of a method claim having contingent limitations does not include steps that are not performed because the condition precedent are not met.  MPEP 2111.04(II).  Thus, the broadest reasonable interpretation of method claim 20 does not require the limitation “when the second inference request . . . the second device”.  However, Examiner is addressing this limitation for the compact prosecution of claims.)
Guim teaches determining whether a priority of a second end device with a second inference request is higher than the priority of the end device; and when the second inference request with the higher priority exists, stopping the inference of the target data in response to the first inference request based on status information of the second end device (i.e., the handling of a lower-priority request can be dropped or delayed on the platform by the service logic in order to timely service the high priority request, see at least [0064]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kasaragod to determine whether a priority of a second end device with a second inference request is higher than the priority of the end device; and when the second inference request with the higher priority exists, stopping the inference of the target data in response to the first inference request based on status information of the second end device as similarly taught by Guim in order to timely service high priority requests (see at least [0052], [0064] of Guim).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Kasaragod, in view of Guim, further in view of Poorchandran.

As per claim 18, Kasaragod does not explicitly teach wherein the generating of the priority comprises: assigning a grade corresponding to the end device based on the status information; and calculating the priority based on the grade, and wherein a priority between the end device and another end device is decided based on a grade assigned to each of the end devices based on the status information.
Poorchandran teaches generating of priority comprises: 
assigning a grade corresponding to the end device based on the status information (i.e., identify a context for each of the compute devices, see at least [0045], [0046]); and
calculating the priority based on the grade (i.e., determine a priority for each context identifier, see at least [0047]), and 
wherein a priority between the end device and another end device is decided based on a grade assigned to each of the end devices based on the status information (i.e., determine a priority for each context identifier, see at least [0047]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kasaragod such that the generating of the priority comprises: assigning a grade corresponding to the end device based on the status information; and calculating the priority based on the grade, and wherein a priority between the end device and another end device is decided based on a grade assigned to each of the end devices based on the status information as similarly taught by Poorchandran in order to determine priority based on context of the device (see at least [0045], [0046], [0047] of Poorchandran).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Kasaragod, in view of Guim, further in view of Roskind (US 10,944,683).

As per claim 19, Kasaragod does not explicitly teach wherein, when a second end device having the priority issues a second inference request to the server, the second inference request is scheduled based on an order of receiving the first inference request and the second inference request. (Examiner notes that the claim limitations are contingent limitations. The broadest reasonable interpretation of a method claim having contingent limitations does not include steps that are not performed because the condition precedent are not met.  MPEP 2111.04(II).  Thus, the broadest reasonable interpretation of method claim 19 does not require the limitations of claim 19.  However, Examiner is addressing this limitation for the compact prosecution of claims.)
Guim teaches when a second end device having the priority issues a second inference request to the server, the second inference request is scheduled (see at least [0064]). 
Roskind teaches a second request is scheduled based on an order of receiving a first request and a second request (i.e., service provider systems that serve many clients generally use first-in-first-out (FIFO) request queues to store pending requests to the system, scheduler may enqueue the new request to the FIFO queue, see at least column 2, lines 61-63, column 7, lines 5-7).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kasaragod to schedule the second inference request based on an order of receiving the first inference request and the second inference request as similarly taught by Roskind because it would have been obvious to use known methods for scheduling processing of requests such as first in first out.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kuo et al. (US 2019/ 0156246) is cited to teach generating and deploying packages for machine learning at edge devices.
Pezzillo et al. (US 2019/0370687) is cited to teach machine learning at edge devices based on distributed feedback.
Power et al. (US 2020/0287821) is cited to teach priority of devices and requests in data service requests.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Jue Louie/
Primary Examiner
Art Unit 2121