DETAILED ACTION

Response to Amendment
Applicant’s response to the last Office Action, filed on 04/25/2022 has been entered and made of record. 
Examiner maintains original grounds of rejection and no new grounds are added; therefore this action is made final.
Interpretation under 35 USC 112(f) is withdrawn in view of amendments.

Response to Arguments
Applicant's arguments filed on 04/25/2022 have been fully considered but they are not persuasive.
Applicant has directed arguments specifically to the Jakubowicz reference. Applicant emphasizes that Jakubowicz is teaching a method of training/learning neural networks which use a parallel architecture and appears to be arguing that Jakubowicz does not actually teach using the object detection neural networks for detecting objects but rather strictly teaches training them. Examiner disagrees that Jakubowicz is restricted in this way and notes that this appears to be a misunderstanding of the disclosure here. First, for context, Examiner notes that a Jakubowicz’s neural network (and object detection neural networks in general) undergo a learning step in which the model is trained by performing object detection on pre-labelled image, in order to compare against the existing label (see ¶ 0019-0020). The goal of this step is to generate a learned model which can then be used in operational object detection. Examiner notes that ¶ 0026 explicitly teaches operational object detection, “The invention can be applied to the surveillance of wide scale video contents, as available in the social networks, and to online advertising in videos . . . The objects to be detected in the videos can correspond to or resemble objects of a sales catalog.” Later in detailing the specific of its process Jakubowicz makes a distinction between the “Device for Learning Descriptors” at ¶ 0069-0072 and “Detection and Location of the Objects” at ¶ 0073-0077. ¶ 0072 (and ¶ 0060-0062) teaches learning “class descriptors” for object detection using a parallel neural network architecture. The descriptors are obtained from parameters of the objects in the image and “class descriptors” are learned in order to describe various object types. Moving on to the steps for “Detection and Location of the Objects,” ¶ 0075 teaches that object descriptors are computed from the test image with the method by using the learned “class descriptors” previously described. As noted, ¶ 0072 (and ¶ 0060-0062) teach using a parallel architecture to learn these “class descriptors.” Thus the object detection at ¶ 0073-0077 directly uses the parallel neural network architecture described. Further ¶ 0077 even states explicitly, “The features described above for the method for learning class descriptors apply to the method for detecting and locating objects,” the features being referred to include the parallel architecture for object detection, which is the majority of the author’s description here. 
Further, Examiner notes that even if Jakubowicz was exclusively directed to learning (which is not the case, as noted above), the learning process still contains an explicit teaching for object detection. As noted above, the learning process is itself based on object detection (see also ¶ 0048). Thus if Jakubowicz somehow only mentioned learning, a parallel architecture neural network for object detection would still be fully described. The learning part of the process has the additional step that detected objects are also compared to an annotated ground truth image. This is an additional step not present during operational detection. Thus by arguing Jakubowicz is limited to learning, Applicant is not successfully showing that Jakubowicz is missing anything, but rather that it contains additional steps not present in the claims.
Examiner disagrees with Applicant’s suggestion at pg. 4 of the Remarks, that the claim language requires some form of real-time arrangement of the neural network configuration, rather than arrangement in advance. There is no step for generating an arrangement that is not configured in advance, the claim simply says “arrange”. This can clearly encompass using a pre-generated arrangement. Regardless, Jakubowicz teaches arranging in a parallel architecture as seen in Fig. 5 and ¶ 0060-0062.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-17, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ananthanarayanan (US PGPub 2019/0205649) in view of Millin (US PGPub 20190327179), Zhou (US PGPub 2019/0130191) and Jakubowicz (US PGPub 2020/0210774).
Regarding claim 1, Ananthanarayanan discloses a cloud-computing-based video processing system comprising: (Ananthanarayanan teaches a video stream analysis technique which uses convolutional neural networks (CNNs) to process video.)
a register, the register registering configuration information associated with video information received by the cloud computing-based video processing system, the configuration information comprising at least one of access information, communication information, metadata, area of interest, analysis information, processing information;  (¶ 0056, 0043 and Fig. 3 teach extracting object areas of interest from whole video frames in order to process the objects. This is a process registering configuration information which comprises areas of interest.)
a scaler scaling computing resources based on the video information received by the cloud computing-based video processing system, the computing resources comprising (Ananthanarayanan ¶ 0023-0025 discusses scaling up computational resources to process the video stream, for example at query time.)
a filter, the filter filtering a video frame of the video information such that all but an area of interest is excluded in a filtered video frame; (As above, ¶ 0056, 0043 and Fig. 3 teach extracting object areas of interest from whole video frames in order to process the objects.)
a processor, the processor processing the filtered video frame using the configured plurality of neural networks that provides insight information, the insight information comprising movements of the objects in the video information; (Fig. 3 and ¶ 0055 show the processing pipeline for the neural networks. ¶ 0111 and 0049 teach that the insight information comprises object detection based on object movement detection.)
a display, the display providing output to a user. (¶ 0157 teach an output display)
a memory, the memory storing the configuration information and insight information in persistent cloud-based storage. (¶ 0155 teaches that storage includes “cloud-based storage accessible via a network, such as the Internet.” ¶ 0008 teaches storing configuration and insight information.)
In the field of cloud computing Millin teaches a scaling component, the scaling component requesting and scaling computing resources for the cloud computing-based processing system (¶ 0119, “on-demand computing resources can afford flexibility to managed network 300, including the ability to quickly scale cloud services up or down through the click of a button, an API call, or an enterprise rule.”)
It would have been obvious to one of ordinary skill in the art to have combined the above combination’s video stream processing with Millin’s system for data processing with a cloud-based server. Ananthanarayanan teaches using cloud computing for processing video stream analysis systems. It is directed to balancing tradeoffs in scaling computing resources during different parts of the video stream analysis cycle. Millin teaches the well-known and widely-used technique of scaling up and down cloud server resources in response to a request. The combination constitutes the repeatable and predictable result of simply applying Millin’s teaching here. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
In the field of video stream analysis Zhou teaches that the insight information comprising object counts and displaying the insight information to the user and  (Zhou teaches a system for video stream analysis which detects a bounding box for an object based on object movement, see ¶ 0063-0064. The bounding box insight information is displayed, see ¶ 0120. ¶ 0067 teaches object counting.)
It would have been obvious to one of ordinary skill in the art to have combined Ananthanarayanan’s video stream analysis with Zhou’s video stream analysis. Ananthanarayanan teaches video stream object detection and an output display but does not mention explicitly what is being displayed. Zhou teaches video stream object detection, displaying video stream analysis insights and teaches object counting. The combination constitutes the repeatable and predictable result of simply applying Zhou’s teaching here. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
In the field of video stream analysis Jakubowicz teaches a configurator adapted to arrange a plurality of neural networks in at least one of a parallel configuration, sequential configuration, or a mixed parallel and sequential configuration that provides a configured plurality of neural networks, wherein each neural network is operable to detect a different object in the video information; (Jakubowicz is a system for configuring a plurality of neural networks performing object detection by arranging the networks in a parallel configuration, with each neural network configured to detect a different object type, see ¶ 0022 and 0092 and Fig. 5)
It would have been obvious to one of ordinary skill in the art to have combined the above combination’s video stream analysis with Jakubowicz’s video stream analysis. Ananthanarayanan teaches video stream object detection with multiple neural networks. Jakubowicz teaches video stream object detection and specifically teaches a computationally efficient implementation with multiple neural networks in a parallel configuration, each for a different object type. The combination constitutes the repeatable and predictable result of simply applying Jakubowicz’s teaching here. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
Regarding claim 2, the above combination discloses the cloud-computing based video processing system, as defined by Claim 1, further comprising representing the area of interest as a polygon. (Hollander ¶ 0050 teaches representing the area of interest as a bounding box.)
Regarding claim 3, the above combination discloses the cloud-computing based video processing system, as defined by Claim 1, wherein the video information comprises at least one of a live video stream, pre-recorded video stream, standalone individual video frame, or an image. (See Ananthanarayanan, ¶ 0041)
Regarding claim 4, the above combination discloses the cloud-computing based video processing system, as defined by Claim 1, wherein the insight information is based on an object detected in the filtered video frame and attributes associated with the object. (Ananthanarayanan Fig. 3 and ¶ 0061 teach using the neural networks to detect objects of a certain class X and their associated frame.)
Regarding claim 5, the above combination discloses the cloud-computing based video processing system, as defined by Claim 1, wherein the plurality of neural networks comprises a deep neural network. (As above, Ananthanarayanan ¶ 0024, ¶ 0055 and Fig. 3 teach configuring multiple CNN neural networks, each a deep neural network.)
Regarding claim 6, the above combination discloses the cloud-computing based video processing system, as defined by Claim 1, wherein the video processing apparatus trains the plurality of neural networks to process an image comprising at least one of a predefined dimension, or a dynamic dimension. (Hollander ¶ 0044 and Ananthanarayanan ¶ 0154)
Claims 7-9, 11, 13, and 14 are the method claims corresponding to apparatus claims of 1-6. The apparatus necessarily requires method steps. Remaining limitations are rejected similarly. See detailed analysis above. 
Regarding claim 10, the above combination discloses the method, as defined by Claim 7, further comprising: configuring video content metadata that provides configured video content used in processing the filtered video frame, configuring the area of interest, and storing the configured video content metadata and the configured area of interest in the persistent cloud-based storage. (Ananthanarayanan ¶ 0056 and Fig. 3 teach extracting object areas of interest from whole video frames in order to process the objects. Configured metadata includes the top-k index data and frame data 328, among other data. ¶ 0008 teaches storing configuration metadata and areas of interest data.)
Regarding claim 12, the above combination discloses the method, as defined by Claim 7, further comprising providing the insight information in response to receiving a request for video frame processing. (Ananthanarayanan ¶ 0055 and Fig. 3 teach that the insight information is generated in response to a query 320 for video analysis.)
Regarding claim 15, the above combination discloses the method, as defined by Claim 7, further comprising training the plurality of neural networks to process at least one of a black-white image, color image. (Ananthanarayanan teaches processing streams from traffic cameras, surveillance cameras, and news channels. The prior art does not expressly disclose that the video images are one of black-white images or color images, but Examiner notes that the concept of using either black and white or color images for traffic, surveillance or news would have been obvious to incorporate with predictable result and without undue experimentation. This is not considered a non-obvious improvement over the prior art. Official Notice is applied here.)
Regarding claim 16, the above combination discloses the method, as defined by Claim 7, further comprising: 
scaling up a computational resource associated with the plurality of neural networks in response to receiving configuration information comprising video information to be processed, the scaling up comprising requesting a cloud provider API to provide additional computational resources; and (Ananthanarayanan ¶ 0023-0025 discusses scaling up computational resources to process the video stream, for example at query time. Millin, ¶ 0119, “on-demand computing resources can afford flexibility to managed network 300, including the ability to quickly scale cloud services up or down through the click of a button, an API call, or an enterprise rule.”)
scaling down computational resources in response to receiving a stop command, the scaling down comprising requesting the cloud provider API to release existing computational resources used for processing the filtered video frame. (As above, Millin, ¶ 0119, “on-demand computing resources can afford flexibility to managed network 300, including the ability to quickly scale cloud services up or down through the click of a button, an API call, or an enterprise rule.”)
Regarding claim 17, the above combination discloses the method, as defined by Claim 7, further comprising: configuring a processing pipeline comprising the insight information that provides an aggregation, executing the configured processing pipeline, and storing the aggregation in the persistent cloud-based storage. (Ananthanarayanan Fig. 3 and ¶ 0061 teach using the neural networks to detect objects of a certain class X and their associated frame and return the aggregated collection of frames. ¶ 0008 teaches storage.)
Regarding claim 19, the above combination discloses the method, as defined by Claim 17, further comprising providing an API access to the aggregation in response to a request to initiate calculation and retrieval of the aggregation. (Ananthanarayanan ¶ 0055 and Fig. 3 teach that the aggregation insight information is generated in response to a query 320 for video analysis, a request to initiate calculation and retrieval. The prior art does not expressly disclose that the results are provided to an API, but Examiner notes that the concept of using providing the results to a programming interface for interaction between applications would have been obvious to incorporate with predictable result and without undue experimentation. This is not considered a non-obvious improvement over the prior art. Official Notice is applied here.)
Claim 20 is the computer readable medium claim corresponding to the apparatus of claim 1. Ananthanarayanan ¶ 0006 teaches a computer readable medium. Remaining limitations are rejected similarly. See detailed analysis above.  

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ananthanarayanan (US PGPub 2019/0205649) in view of Millin (US PGPub 20190327179), Zhou (US PGPub 2019/0130191), Jakubowicz (US PGPub 2020/0210774) and WebGUI (“Features”).
Regarding claim 18, the above combination discloses the method, as defined by Claim 17, further comprising providing access to the aggregation in response to a request to initiate calculation and retrieval of the aggregation. (Ananthanarayanan ¶ 0055 and Fig. 3 teach that the aggregation insight information is generated in response to a query 320 for video analysis, a request to initiate calculation and retrieval.)
In the field of content management systems WebGUI teaches a content management system (¶ 1, WebGUI is a content management system and web application framework, which allows for easy management of content such as photo galleries.)
It would have been obvious to one of ordinary skill in the art to have combined the above combination’s video stream processing with WebGUI’s content management system. Ananthanarayanan and Hollander both teach displaying results of their video stream analysis systems. WebGUI is software for managing display of content. The combination constitutes the repeatable and predictable result of simply displaying image content with a content display software. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.

Claims 21 and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ananthanarayanan (US PGPub 2019/0205649) in view of Millin (US PGPub 20190327179), Zhou (US PGPub 2019/0130191), Jakubowicz (US PGPub 2020/0210774), Malik (US PGPub 2009/0217315) and Knapp (US PGPub 2010/0266189)
Regarding claim 21, the above combination discloses a cloud-computing based video processing system comprising: (See rejection of claim 1)
a register that registers configuration information associated with video information received by the cloud-computing based video processing system from a plurality of video streaming devices registered to the cloud-computing based video processing system in a virtual private network, wherein the configuration information comprises access information, communication information, metadata, area of interest information, analysis information, and processing information, (Ananthanarayanan ¶ 0122 teaches a plurality of video streaming devices. ¶ 0157 teaches access information and communication information. ¶ 0059 teaches metadata, analysis information and processing information. ¶ 0056, 0043 and Fig. 3 teach extracting object areas of interest from whole video frames in order to process the objects. This is a process registering configuration information which comprises areas of interest.)
a scaler that requests and scales computing resources based on the video information received by the cloud-computing based video processing system, the computing resources comprising: (See rejection of claim 1)
a filter that filters each video frame of the video information such that only information pertaining to the one or more polygons remains in each filtered video frame and such that all pixel values outside the one or more polygons in each filtered video frame have a zero value; (See rejection of claim 1)
a configurator adapted to arrange a plurality of neural networks in at least one of a parallel configuration and a sequential configuration, wherein a portion of the neural networks is operable to perform pre-processing of each video frame (see rejection of claim 1) the configuration device further assigning at least one neural network to each of the one or more polygons, wherein each neural network detects a different feature of its assigned polygon, and wherein the features include object types, and detected coordinates of objects; (See rejection of claim 1 and in particular the combination with Jakubowicz)
 a processor that processes each filtered video frame using the configured plurality of neural networks to provide insight information, the insight information comprising object counts, object detections, and object types in the video information, wherein the processor tracks object movements in the video information and automatically formulates the insight information and object movements in a report; (See rejection of claim 1. Zhou ¶ 0013 teaches tracking object movements.)
a display that provides the report to a user; and a memory that stores the configuration information and insight information in persistent cloud-based storage. (See rejection of claim 1.)
In the field of video stream analysis Malik teaches that the area of interest information comprises user-defined coordinates of one or more polygons, and wherein each polygon comprises an object recorded in one or more video frames of the video information; (¶ 0057 teaches a user defining a polygon shaped region of interest.) and teaching detecting emotions (¶ 0113 teaches detecting emotions and expressions from facial attributes.)
It would have been obvious to one of ordinary skill in the art to have combined the above combination’s video stream processing with Malik’s video stream processing. Ananthanarayanan teaches extracting regions of interest and classifying object types. Malik teaches that said regions of interest are user defined and teaches classifying objects that are faces into emotion types. The combination constitutes the repeatable and predictable result of simply applying Malik’s techniques here. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
In the field of video stream analysis Knapp teaches performing grayscale adjustment, resizing, and normalization of the image (Knapp is a system for performing normalization and image enhancement. See Fig. 1, ¶ 0022, 0032, 0034.)
It would have been obvious to one of ordinary skill in the art to have combined the above combination’s video stream processing with Knapp’s video stream processing. The above combination teaches video stream analysis with a variety of techniques. Knapp teaches performing the well-known and widely-used techniques of image pre-processing such as grayscale adjustment, resizing, and normalization. The combination constitutes the repeatable and predictable result of simply applying Knapp’s techniques here. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
Claim 22 is a method corresponding to the system of claim 21. The system necessarily contains the method steps. Remaining limitations are rejected similarly. See detailed analysis above. 

Conclusion
Examiner maintains original grounds of rejection and no new grounds are added; therefore THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Raphael Schwartz whose telephone number is (571)270-3822.  The examiner can normally be reached on Monday to Friday 9am-5pm CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on (571) 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/RAPHAEL SCHWARTZ/           Examiner, Art Unit 2661   

/VINCENT RUDOLPH/           Supervisory Patent Examiner, Art Unit 2661