DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Applicant’s election without traverse of Group I, directed to claims 1-10 and 18-20, in the reply filed on 26 July 2021 is acknowledged. Claims 1-20 are all the claims pending in the application, of which claims 11-17 are withdrawn. Claims 1-5, 7-10, and 18-20 are rejected. Claim 6 is objected to. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 7, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2019/0005653 to Choi et al. (hereinafter “Choi”) in view of U.S. Patent Application Publication No. 2020/0106930 to Hohjoh et al. (hereinafter “Hohjoh”).
As to independent claim 1, Choi discloses a method, comprising: receiving a compressed digital video ([0032] discloses receiving image data using a CCTV (video) camera; see Fig. 1; [0034] discloses that the image data is encoded using standard image formats such as ; decompressing the compressed digital video to generate a decompressed digital video including a plurality of digital images ([0005, 0045] discloses decoding the encoded image data to produce foreground extraction target frames on which foreground extraction is subsequently performed), wherein decompressing the compressed digital video comprises: identifying motion data from the compressed digital video; and generating pixel data for the plurality of digital images based at least in part on the motion data ([0067-0068] discloses that the decoding process includes extracting encoded parameters comprising motion vectors and generating the decoded foreground extraction target frames using the motion vectors); identifying a subset of the pixel data from the decompressed digital video based on the identified motion data ([0080-0082] disclose extracting foreground pixels/blocks based on the extracted motion vectors; see Fig. 7). 
Choi does not expressly disclose providing the subset of the pixel data as input to an image processing model trained to generate an output based on input pixel data. 
Hohjoh, like Choi, is directed to foreground/object identification using motion vectors extracted from video (Abstract and [0042]). Hohjoh discloses using the motion vectors to identify identification target region candidates, and inputting the candidate regions into a pre-trained CNN-based image classifier which outputs a category/class of the detected object ([0059-0063, 0154-0164]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Choi to provide the extracted foreground regions to a pre-trained CNN model which categorizes the objects in the regions, as taught by Hohjoh, to arrive at the claimed invention discussed above. Such a modification is the result of combining 
As to claim 2, the proposed combination of Choi and Hohjoh further teaches that the motion data comprises motion vector data extracted from the compressed digital video while decompressing the compressed digital video, wherein the motion vector data is associated with localized movement of content represented by the pixel data between subsequent digital images of the plurality of digital images ([0067-0068] of Choi discloses that the motion vectors are extracted from the video as part of the decoding process and that the motion vectors are a difference value between frames; [0044] of Hohjoh more explicitly defines a motion vector as displacement from a block in a frame to a corresponding block in another frame following that frame; the reasons for combining the references are the same as those discussed above in conjunction with claim 1). 
As to claim 7, the proposed combination of Choi and Hohjoh further teaches that identifying the subset of the pixel data comprises identifying portions of digital images from the plurality of digital images based on motion data identified for the plurality of digital images; and wherein providing the subset of the pixel data as input to the image processing model comprises providing the identified portions of the digital images as input to the image processing model without providing pixel data for one or more additional portions of the digital images as input to the image processing model ([0080-0082] of Choi disclose extracting foreground pixels/blocks based on the extracted motion vectors; see Fig. 7; [0059-0063, 0154-0164] of Hohjoh similarly discloses using the motion vectors to identify identification target region candidates, and further discloses only inputting the candidate regions into a pre-trained CNN-based image classifier which outputs a category/class of the detected 
As to claim 10, Choi further discloses a computing device that receives the compressed digital video and decompresses the compressed digital video to generate the decompressed digital video ([0043] discloses an apparatus for receiving the encoded (compressed) video and decoding (decompressing) the video). Choi does not expressly disclose an image processing model comprises a deep learning model, and wherein the deep learning model is implemented on the computing device. However, Hohjoh discloses a pre-trained CNN-based image classifier which outputs a category/class of the detected object, the classifier being part of an object identification unit 13 which is part of the same device 1 that includes the window specification unit 12 that extracts the object candidate regions ([0059-0063, 0154-0164] and Fig. 1). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Choi to include the include a pre-trained CNN model which categorizes the objects in the regions on the same device that extracts the candidate regions, as taught by Hohjoh, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to allow a single user to perform the object detection and classification. 

Independent claim 18 recites a system, comprising: one or more processors; memory in electronic communication with the one or more processors; and instructions stored in the memory, the instructions being executable by the one or more processors to cause a computing device to ([0012] of Choi discloses that the disclosed algorithm is performed by an . 

Claims 3, 4, 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Choi in view of Hohjoh and further in view of U.S. Patent Application Publication No. 2008/0043848 to Kuhn et al. (hereinafter “Kuhn”).
As to claim 3, although Choi discloses extracting encoding parameters from video during decoding/decompression ([0067-0068]), the proposed combination of Choi and Hohjoh does not expressly disclose that the motion data further comprises camera movement data extracted from the compressed digital video while decompressing the compressed digital video, wherein the camera movement data is associated with global movement of content represented by the pixel data between subsequent digital images of the plurality of digital images. 
Kuhn, like Choi, is directed to extracting encoded parameters such as motion information from video in the compressed domain for foreground segmentation (Abstract and [0066-0071, 0077]). Kuhn discloses that the motion information includes camera motion data which is associated with global movement of pixel data between frames ([0074, 0089-0096]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Choi and Hohjoh to further extract camera motion encoded in the compressed domain, the camera motion being associated with global movement of content between consecutive frames of video, as taught by Kuhn, to arrive at the claimed invention discussed above. Such a modification is the result of combining 

As to claim 4, the proposed combination of Choi, Hohjoh, and Kuhn further teaches that identifying the subset of the pixel data comprises identifying a subset of digital images from the plurality of digital images based on motion data corresponding to the identified subset of digital images; and wherein providing the subset of the pixel data comprises providing the subset of digital images as input to the image processing model ([0077-0079, 0091-0092] of Kuhn discloses using the motion data to identify scenes (a subset of frames) within the video and extract foreground therefrom; [0059-0063, 0154-0164] of Hohjoh discloses a pre-trained CNN-based image classifier which outputs a category/class of detected objects in video frames; the reasons for combining the references are analogous to those discussed above in conjunction with claim 3). 
As to claim 8, the proposed combination of Choi, Hohjoh, and Kuhn further teaches identifying a scene change within the decompressed digital video based on the motion data; and providing the subset of the pixel data as input to the image processing model in response to identifying the scene change ([0092] of Kuhn discloses detecting scene changes based on the motion vectors; [0059-0063, 0154-0164] of Hohjoh discloses a pre-trained CNN-based image classifier which outputs a category/class of detected objects in video frames; the reasons for combining the references are analogous to those discussed above in conjunction with claim 3).

As to claim 19, the proposed combination of Choi, Hohjoh, and Kuhn further teaches that the motion data comprises motion vector data extracted from the compressed digital video, wherein the motion vector data is associated with localized movement of content represented by the pixel data between subsequent digital images of the plurality of digital images ([0067-0068] of Choi discloses that the motion vectors are extracted from the video as part of the decoding process and that the motion vectors are a difference value between frames; [0044] of Hohjoh more explicitly defines a motion vector as displacement from a block in a frame to a corresponding block in another frame following that frame; the reasons for combining the references are the same as those discussed above in conjunction with claim 1); and wherein the motion data comprises camera movement data extracted from the compressed digital video, wherein the camera movement data is associated with global movement of content represented by the pixel data between subsequent digital images of the plurality of digital images ([0071, 0074, 0089-0096] of Kuhn discloses that the motion information includes camera motion data which is associated with global movement of pixel data between frames, the global motion data being encoded in the compressed domain; [0067-0068] of Choi discloses that the encoding parameters are extracted from the video as part of the decoding process; the reasons for combining the references are the same as those discussed above in conjunction with claim 3). 

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Choi in view of Hohjoh and Kuhn and further in view of U.S. Patent Application Publication No. 2002/0024999 to Yamaguchi et al. (hereinafter “Yamaguchi”).
As to claim 5, the proposed combination of Choi, Hohjoh, and Kuhn does not expressly disclose that identifying the subset of the pixel data comprises selectively identifying digital images from the plurality of digital images at a first frame rate less than a second frame rate of the plurality of digital images. 
Yamaguchi, like Kuhn, is directed to scene classification using motion vectors ([0082-0094]. Yamaguchi discloses that a frame rate is reduced if low motion exists in the frame, while the frame rate is increased when motion is high ([0106-0108]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Choi, Hohjoh, and Kuhn to perform the algorithm at a first frame rate lower than a second frame rate if motion is high, as taught by Yamaguchi, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to prevent motion from affecting identification results. 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Choi in view of Hohjoh and further in view of U.S. Patent Application Publication No. 2020/0097769 to Lipchin et al. (hereinafter “Lipchin”).
As to claim 9, the proposed combination of Choi and Hohjoh further teaches that the image processing model comprises a deep learning model ([0058, 0148] of Hohjoh discloses that the pre-trained model uses deep machine learning, such as a convolutional neural network). The proposed combination of Choi and Hohjoh does not expressly disclose that the deep learning model is implemented on a cloud computing system. 

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the proposed combination of Choi and Hohjoh to implement Hohjoh’s CNN on a cloud network, as taught by Lipchin, to arrive at the claimed invention discussed above. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to save computational resources on the client side.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Choi in view of Hohjoh and further in view of Yamaguchi.
As to claim 20, the proposed combination of Choi and Hohjoh does not expressly disclose instructions that, when executed, cause the computing device to: identify a scene within the decompressed digital video based on the motion data; determine a frame rate of digital images for a portion of the decompressed digital video corresponding to the identified scene based on the motion data; and wherein providing the subset of the pixel data as input to the image processing model comprises providing a subset of digital images from the plurality of digital images corresponding to the portion of the decompressed digital video and based on the determined frame rate. 
Yamaguchi, like Kuhn, is directed to scene classification using motion vectors ([0082-0094]. Yamaguchi discloses that a frame rate is reduced if low motion exists in the frame, while the frame rate is increased when motion is high ([0106-0108]).
.

Allowable Subject Matter 
Claim 6 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Abiko (U.S. Patent Application Publication No. 2002/0036717) discloses inputting a compressed moving image encoded with inter-frame prediction and decoding the compressed moving image. Abiko further discloses extracting motion vector data for each frame during the decoding process and segmenting scenes in the video based on the motion vector data (See [0093-0100] and Fig. 3).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN M CONNER whose telephone number is (571)272-1486.  The examiner can normally be reached on noon - 8:30 PM Monday through Thursday and Saturday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on (571) 270-1051.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/SEAN M CONNER/Primary Examiner, Art Unit 2663