DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending in this office. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 25th, 2019 and December 29th, 20 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 10-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”).

	Regarding claim 1, RAN teaches a system comprising (

    PNG
    media_image1.png
    542
    694
    media_image1.png
    Greyscale

{See Fig. 1 above discloses a system}): 

a processor(Ran: Introduction pg. 43, Fig 1: 
    PNG
    media_image2.png
    265
    304
    media_image2.png
    Greyscale

{Examiner correlates the mobile phone having a processor to analyze input video and displaying an output afterwards}); and 

a storage memory storing computer-readable instructions, which when executed by the processor, cause the processor to (Ran: Introduction: pg. 42, a typical Android phone; that is, the video stxeams cannot be analyzed in real time [2]. Even with speedup from the mobile GPU [13], typical processing times are approximately 600 ms, which is equivalent to less than 1.7 frames per second. and is still not acceptable for real time processing. Pg. 43

    PNG
    media_image3.png
    86
    449
    media_image3.png
    Greyscale

{Examiner correlates the android phone to have storage memory to process the video stream and analyze the stream based on its parameter}):

 receive a video query regarding a live video stream (Ran: Introduction pg. 42;

    PNG
    media_image4.png
    260
    322
    media_image4.png
    Greyscale

 {Examiner correlates the input video as the video query}); 

determine resources available to the system and a defined threshold confidence value associated with the video query(Ran: Introduction pg. 42;

    PNG
    media_image5.png
    145
    296
    media_image5.png
    Greyscale

{Examiner correlates determining the resource available in the system according to determining the tradeoffs between the object detection (from the scene of the video query) within the system parameters according to the offloading decision (See Fig. 1 below, Small CNN runs on Mobile Phone and Big CNN runs on the server) and also see Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale
}) to determine the resource are available}

    PNG
    media_image4.png
    260
    322
    media_image4.png
    Greyscale
});

select a configuration for processing the video query based at least on the determined resources(Ran: Introduction pg. 42;

    PNG
    media_image5.png
    145
    296
    media_image5.png
    Greyscale

{Examiner correlates the selection of the processing video query based on the resource based on the tradeoff that was received from the object and based on the system would determine according to its parameters to decide an image resolution and model size and offloading decision to provide the best optimal decision by utilizing the offloading decision engine of Fig. 1 (shown below)

    PNG
    media_image4.png
    260
    322
    media_image4.png
    Greyscale

}); 
allocate processing between one or more cameras and one or more edge devices according to the selected configuration(Ran: 
    PNG
    media_image4.png
    260
    322
    media_image4.png
    Greyscale

Introduction pg. 42;

    PNG
    media_image7.png
    106
    297
    media_image7.png
    Greyscale


    PNG
    media_image8.png
    40
    294
    media_image8.png
    Greyscale


    PNG
    media_image9.png
    101
    303
    media_image9.png
    Greyscale

{Examiner correlates allocating the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation in which is fed to the Small CNN run locally on the phone while higher resolution require higher computation is sent to the cloud based on the associated learnable parameters of the device. The mobile phone includes a camera module and thus results of the dimension is based on the resolution of the device and the camera to provide the best configuration}); and

adjust the selected configuration to include processing among one or more cloud devices when processing results from the one or more cameras and the one or more edge devices do not meet the defined threshold confidence value(Ran: Introduction pg. 42;

    PNG
    media_image10.png
    258
    313
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    157
    316
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    85
    315
    media_image12.png
    Greyscale

{Examiner correlates adjusting the frame resolution as adjusting the selected configuration according the Big CNN which is run on the cloud server in Fig. 1 which is based on the offloading decision engine. The detection accuracy of one the parameter would be consider in determining whether the processing results would be apply to the camera or offload to the Big CNN server (See Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

“}).  

	Regarding claim 10, Ran teaches when executed by the processor, further cause the processor to: dynamically determine whether resources available to the system have changed(Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

“ {Examiner correlates determining the resource are available based on measuring the bandwidth and frame rate and based on the policy would determine the condition on which is suitable to for the system to run}); and

 	when the resource availability has changed, modify the allocation of processing among the one or more cameras, the one or more edge devices, and the one or more cloud devices based at least on the resource availability having changed(Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates adjusting the frame resolution as adjusting the selected configuration according the Big CNN which is run on the cloud server in Fig. 1 which is based on the offloading decision engine. The detection accuracy of one the parameter would be consider in determining whether the processing results would be apply to the camera or offload to the Big CNN server (See Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

“}).  
	Regarding claim 11, Ran teaches determining resources available to the system further comprises determining whether network connectivity to the one or more cloud devices is available(Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}).  

	Regarding claim 12, Ran teaches the selected configuration is adjusted to an edge-only mode of processing by allocating all processing between the one or more cameras and the one or more edge devices when network connectivity to the one or more cloud devices is unavailable or bandwidth to the one or more cloud devices is insufficient(Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{ Examiner correlates determining the resource available based on the network connectivity according to determining the offload policy on whether the CNN on the phone runs better or the CNN on the server is suitable to run better}).  

	Regarding claim 13, Ran teaches a method comprising: allocating processing of input data between one or more edge devices and one or more cloud devices, the one or more edge devices using an edge processing model, and the one or more cloud devices using a cloud processing model different from the edge processing model (

    PNG
    media_image2.png
    265
    304
    media_image2.png
    Greyscale

{Examiner correlates Fig. 1 showing the small CNN model utilizing the mobile phone and the Big CNN utilizing the server});  

Page 31 of 34Attorney Docket No.: 406470-US-NP determining a current network capability between the one or more edge devices and one or more cloud devices(Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image13.png
    155
    323
    media_image13.png
    Greyscale

{Examiner correlates the network capability between the edge device and cloud device as monitoring the network bandwidth and frame rates}); and 

shifting processing load of the input data to increase processing by the one or more edge devices using a moderate computationally-intensive algorithm upon determining that the current network capability between the one or more edge devices and the one or more cloud devices is unavailable (Ran: 4.3 Impact of variable network conditions, pg. 45, “

    PNG
    media_image13.png
    155
    323
    media_image13.png
    Greyscale

{Examiner correlates the offloading as determining to shift the processing load to increasing the processing of the mobile device by observing if the frame rate is below a threshold in which is then run on the mobile phone instead}).  

	Regarding claim 14, Ran teaches further comprising allocating processing to one or more smart devices, the one or more smart devices performing processing that is computationally cheaper than the edge processing model used by the one or more edge devices (Ran: pg. 44-45, “

    PNG
    media_image14.png
    228
    329
    media_image14.png
    Greyscale



    PNG
    media_image15.png
    355
    325
    media_image15.png
    Greyscale

{Examiner correlates allocating the camera and edge devices based on the features received by adjusting the resolution of the camera according to the input and reviewing the results to determine that that lower resolution requires less computation while higher resolution require higher computation based on the associated learnable parameters of the device}).  

	Regarding claim 15, Ran teaches further comprising dynamically shifting the processing load of the input data back to the one or more cloud devices upon determining that the current network capability between the one or more edge devices and the one or more cloud devices has been restored(Ran: pg. 44-45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates the offloading as determining to shift the processing load to increasing the processing of the mobile device by observing if the frame rate is below a threshold in which is then run on the mobile phone instead}).  

	Regarding claim 16, Ran teaches the cloud processing model is a more computationally expensive model than the edge processing model (Ran: pg. 44-45, “

    PNG
    media_image14.png
    228
    329
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    355
    325
    media_image15.png
    Greyscale

{Examiner correlates cloud server more expensive as the frame rates are significantly higher, however the accuracy is much larger and smoother compare the smaller model on the phone where offloading may be helpful to transmit data}}).  

	Regarding claim 17, Ran teaches a method comprising: receiving input video data from one or more cameras(

    PNG
    media_image2.png
    265
    304
    media_image2.png
    Greyscale

{Examiner correlates Fig. 1 showing the input of the video of the data from the camera to be process}); 

accessing a database of a plurality of video processing configurations(Ran: pg. 43, “

    PNG
    media_image16.png
    172
    487
    media_image16.png
    Greyscale
”
and pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale
”
{Examiner correlates the database of plurality of video processing configuration based on the policy (profiler) to processing the video in such that when the data is being monitor it may observe a threshold to determine if it should be run on a mobile phone or run on a server}); 

evaluating the plurality of video processing configurations against resource availability across local devices and cloud devices(Ran: pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale
{Examiner correlates evaluating the video processing configuration against the resource level across the local device and cloud device by monitoring the network bandwidth and observing the offloading policy to determine the maximize frame rate to be run by utilizing a threshold below would be run on a phone and a threshold run above would be run on a server}); and 

selecting a configuration that allocates processing to the one or more cameras, one or more edge devices, and one or more cloud devices(Ran: pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates selecting a configuration to be allocate is according the monitoring the frame rate and based on the condition that is met would be run on the phone if the rate is low and if the rate is high it would be run on the server}).  

	Regarding claim 18, Ran teaches the video processing configurations specify a frame resolution, frame rate, and a type of DNN model to be used in processing the input video data (Ran: pg. 44-45, “

    PNG
    media_image17.png
    181
    305
    media_image17.png
    Greyscale

{Examiner correlates on adapting the difference image resolution (frame resolution) and the frame rate (low or high) to be process on the type DNN model (phone or server) in which is based on the condition that met to be run specifically based on the parameter}).  

	Regarding claim 19, Ran teaches the video processing configurations each have a resource cost, and a configuration is selected that achieves an optimal tradeoff between resource cost and average accuracy (Ran: pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates selecting a configuration to be tradeoff is according the monitoring the frame rate and based on the condition that is met would be run on the phone if the rate is low and if the rate is high it would be run on the server}

    PNG
    media_image18.png
    255
    316
    media_image18.png
    Greyscale
).  

	Regarding claim 20, Ran teaches further comprising dynamically modifying the selected configuration upon determining that the resource availability has changed (Ran, pg. 45, “

    PNG
    media_image6.png
    154
    308
    media_image6.png
    Greyscale

{Examiner correlates the dynamically modifying the selected configuration based on the resource availability based on monitoring the frame and apply the right CNN model based on whether the frame rate has change when observing the input video query}).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-9 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature "Delivering Deep Learning to Mobile Devices via Offloading" issued to Xukan Ran et al. (hereinafter as “Ran”) in view of Non-Patent Literature "A Computing Platform for Video Crowdprocessing Using Deep Learning" issued to Lu et al. (hereinafter as "Lu").

Regarding claim 2, Ran teaches claimed invention substantially as claimed, however, Ran does not explicitly teach the selected configuration directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module.

Lu teaches the selected configuration directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module (

    PNG
    media_image19.png
    259
    502
    media_image19.png
    Greyscale

See Lu pg 1431, III. OVERVIEW: We consider a crowdprocessing approach to perform object detection/classification of videos. This task includes filtering of videos based on metadata, and then processing the videos to perform object detection. Alternatively, the user may perform frame extraction locally and then offload specific frames to the cloud. In this case, it needs to determine whether each frame is processed by (ii) frame offload in which the frame is sent to the cloud for detection or (iii) local detection where the frame is detected on the mobile device {Examiner correlates based on the frame extraction that heavy processing would be offloaded to the cloud to perform the video processing while low processing can be immediately by process the object detection then deliver to the cloud to perform the video processing}).  

It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify Ran with the teachings of Lu to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting 
as taught by Ran including the selected configuration directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module as taught by Lu in such improves Ran’s teaching of selecting the best configuration by incorporating Lu’s teaching of frame extraction of offloading in such improve the performance of batch processing based on the completion time and energy consumption (Lu: pg. 1435, B. Adaptive Algorithm: “starts processing, the frames should not to be offloaded to avoid duplicate processing. Therefore, it is difficult to determine the number of frames included in batch processing, since offloading frames may improve performance during batch processing”).

	Regarding claim 3, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Lu further teaches the selected configuration directs the one or more cameras or the one or more edge devices to perform background subtraction on the extracted video frames(Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, determine the best batch size” {Examiner correlates the difference rate in detecting the object would determine the best batch size to be deliver}).  

	Regarding claim 4, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Lu further teaches the background subtraction is performed on the extracted video frames to determine whether additional processing should be performed(Lu: pg.1432, A. Frame Extraction, “Frame extraction is used to take individual video frames and transform them into images upon which object detection may be performed. To target objects with different dynamics within the video, the task issuer may request a different frame extraction rate. For example, for an object moving at a high speed like a car, the rate should be high enough to not miss the object”. B. Detection, “However, batch processing performs much better, where the intercept (α) is about 240ms and the slope (β) is 400ms. The difference grows with the increase of the number of frames. For detection, it is better to put more frames in a batch to reduce processing time. However, the system must wait longer to get the extracted frames from the video. Therefore, it is difficult to determine the best batch size” {Examiner correlates the additional processing based on observing the difference growth in the rate when detecting the object in which allows to put more frames in the batch in which requires additional processing to determine best batch size}).  

	Regarding claim 5, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Ran further teaches the selected configuration directs the one or more cameras or the one or more edge devices to perform processing of the extracted video frames using a lightweight DNN model locally on the one or more cameras or the one or more edge devices(Ran, pg. 45, “

    PNG
    media_image20.png
    153
    311
    media_image20.png
    Greyscale

{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is below a certain threshold, the CNN can be performed on the mobile phone (lightweight DNN Model)}).  

	Regarding claim 6, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Ran further teaches the selected configuration directs the one or more cloud devices to perform processing of the extracted video frames using a heavy DNN model when results from the lightweight DNN model do not meet the defined threshold confidence value(Ran, pg. 45, “

    PNG
    media_image20.png
    153
    311
    media_image20.png
    Greyscale

{Examiner correlates the offloading to determine the best course of action to be perform on the video processing in such that when the frame rate is above a certain threshold, the CNN can be performed on the server (heavyweight DNN Model)}).  

	Regarding claim 7, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Ran further teaches the lightweight DNN model comprises at least a first lightweight DNN model, and a second lightweight DNN model that requires additional computational resources than the first lightweight DNN model, but less computational resources than the heavy DNN model(Ran, pg. 45-46, “

    PNG
    media_image20.png
    153
    311
    media_image20.png
    Greyscale



    PNG
    media_image21.png
    244
    322
    media_image21.png
    Greyscale

{Examiner correlates the trade-off based on the deep neural network to adjust the cost of the accuracy to work the frame work by observing the threshold value to determine on how run the frame specifically according to the parameter based on less processing or heavy processing}).  

	Regarding claim 8, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Ran further teaches the heavy DNN model comprises at least a first heavy DNN model, and a second heavy DNN model that requires additional computational resources than the first heavy DNN model(Ran, pg. 45-46, “

    PNG
    media_image20.png
    153
    311
    media_image20.png
    Greyscale





    PNG
    media_image21.png
    244
    322
    media_image21.png
    Greyscale

{Examiner correlates the trade-off based on the deep neural network to adjust the cost of the accuracy to work the frame work by observing the threshold value to determine on how run the frame specifically according to the parameter based on less processing or heavy processing}).  

	Regarding claim 9, the modification of Ran and Lu teaches claimed invention substantially as claimed, and Lu further teaches when executed by the processor, further cause the processor to:  Page 30 of 34Attorney Docket No.: 406470-US-NPassign tags to objects discovered during processing of the extracted video frames(Lu: pg. 1431, III. OVERVIEW, “Alternatively, the user can process some of the frames locally and offload others…
Once these videos are processed using deep learning either locally on the mobile device or remotely on the cloud, the task issuer will receive information about the videos related to the query. For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest, or the entire video to the cloud. Participation in the task may reveal personal information and intrude on users’ privacy, but users are allowed to filter out their personal videos {Examiner correlates assigning the object as observing the extracted video frame based on the frames that are being process on the device include the tags and frames of interest requested by the user in   in which are then send to the cloud}); and

 	store the tags in an index database for use in locating the objects in response to a query on a stored version of the live video stream(Lu: pg. 1431, OVERVIEW, Once these videos are processed using deep learning either locally on the mobile device or remotely on the cloud, the task issuer will receive information about the videos related to the query. For frames that are processed on the mobile devices, the user will forward either the tags, the frames of interest, or the entire video to the cloud” {Examiner correlates the video are processed locally on the mobile device or remotely on the cloud, the query receive would be process and requested to be process to the user where the user can forwards the data to the cloud in the processing wad done locally on the phone}).  

It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify Ran with the teachings of Lu to include receive a video query regarding a live video stream and determine the best allocation to select the recommend configuration setting 
as taught by Ran including the selected configuration directs the one or more cameras or the one or more edge devices to extract video frames from the live video stream using a decoding module as taught by Lu in such improves Ran’s teaching of selecting the best configuration by incorporating Lu’s teaching of frame extraction of offloading in such improve the performance of batch processing based on the completion time and energy consumption (Lu: pg. 1435, B. Adaptive Algorithm: “starts processing, the frames should not to be offloaded to avoid duplicate processing. Therefore, it is difficult to determine the number of frames included in batch processing, since offloading frames may improve performance during batch processing”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
U.S Patent Application Publication 2007/0286489 issued to Amini et al. (hereinafter as “Amini”) teaches filtering video packets for video stream monitoring based on the features extracted from the specific video packet retrieved from the histogram.
U.S Patent Application Publication 2008/0089552 issued to Nakamura et al. (hereinafter as “Nakamura”) teaches a digital watermark of sequentially obtaining frame images of moving image data and frame display and generating output of feedback information to the user.
U.S Patent Application Publication 2013/0166711 issued to Wang et al. (hereinafter as “Wang”) teaches three-tier intelligent video surveillance management system in which is used to configured to obtain video content and metadata by filtering the metadata according to the criteria.


					Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW N HO whose telephone number is (571)270-0590.  The examiner can normally be reached on M-F 10:30 -7.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached on (571)272-4215.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

2/13/2021
/ANDREW N HO/Examiner
Art Unit 2162