DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Currently, claims 1, 2 and 4-20 are pending in the application. Claim 3 is cancelled. 
Continued Examination Under 37 CFR 1.114 1.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/27/2021 has been entered.

Response to Arguments / Amendments
Applicant’s arguments have been fully considered but are rendered moot in view of the new ground of rejection necessitated by amendments initiated by the applicant.

Claim Interpretations - 35 USC § 112 ¶ (f)
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.  

As explained in MPEP 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph: 
the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;  
the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as "configured to" or "so that"; and  
the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.  
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.  

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
 	If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-13  and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hannuksela et al. (US 20210195206, hereinafter Hannuksela) in view of Sekar et al. (US 20190289359, hereinafter Sekar).

Regarding Claim 1, Hannuksela discloses a method for video processing at a device ([0185],  FIG. 5), comprising: 
receiving a bitstream comprising a set of video frame; batching the set of video frames into a first subset of video frames and a second subset of video frames based at least in part on a change in a reference scene associated with the set of video frames ([0231], encoding adaptively turns learning on or off (defines first and second subsets of the frames during switching on   and switching off of learning) in a dynamic manner such that in the case the encoder detects or expects that online training for a certain frame/slice/unit is not beneficial, it can turn the learning off and indicate the decision in or along the bitstream to the decoder; [0235], only certain pictures used for online training);
([0231], [0235], only the pictures at the lowest temporal sub-layer, such as the temporal sub-layer with Temporal Id equal to 0 in HEVC are used for online training  selectively);
generating the first subset of video frames using a video processing unit of the device ([0185], FIG. 5, deriving (502) a first encoded prediction error block through encoding a difference of the first prediction block and a first input block; encoding (504) the first encoded prediction error block into a bitstream); 
generating the second subset of video frames using the neural processing unit of the device ([0185], FIG. 5, retraining (510) the neural net with the training signal to obtain a second set of parameters for the neural net; deriving (512) a second prediction block at least partly based on an output of the neural net using the second set of parameters); and 
generating a set of video packets comprising the first subset of video frames, the second subset of video frames, or both ([0185], FIG. 5, encoding (504) the first encoded prediction error block into a bitstream; deriving (508) a training signal from one or both of the first encoded prediction error block and/or the first reconstructed prediction error block, and encoding (516) the second encoded prediction error block into a bitstream).
Hannuksela does not explicitly disclose generating the first and the second frame subsets in parallel. 
Sekar teaches from the same field of endeavor generating the first and the second frame subsets in parallel ([0024],  machine learning enabled method and system for generating content data about a video segmenting the video into K scene segments using a neural network trained to segment video based on scene variations, human attribute recognition and face representation in parallel on each of the N groups of video frames using pre-trained machine learning algorithms; [0078], FIG. 12, machine learning based content data generation system 118 includes multiple services that are controlled by a service controller 1100 which provides the video data to two services in parallel, namely a frame splitting service 1104 and a video scene segmentation service 1106).
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of parallel generation of the subsets of video frames Sekar ([0078]) into the video processing system of Hannuksela in order to provide improved user experience and  improved or more efficient content source and the communications network as well as more efficient use of system resources  (Sekar, [0063]).
Regarding Claim 2, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further comprising: identifying a long-term reference frame of the set of video frames, wherein batching the set of video frames into the first subset of video frames and the second subset of video frames is based at least in part on identifying the long-term reference frame ([0141] an indication of a reference picture type, such as a long-term reference picture and an inter-layer reference picture- indicated per reference picture).

Regarding Claim 4, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further wherein generating the second subset of video frames using the neural processing unit comprises: generating a first frame of the second subset of video frames based at least in part on a long-term reference frame of the set of video frames ([0242], a reference to a long-term reference picture with picture order count difference equal to 0 compared to that of the current picture is inferred to concern an inter-view or inter-layer reference picture).

Regarding Claim 5, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further wherein generating the first subset of video packets and the second subset of video packets comprises: synchronizing the first subset of video frames, the second subset of video frames, or both, based at least in part on temporal information associated with the first subset of video frames and the second subset of video frames ([0144], a reference to a long-term reference picture with picture order count difference as temporal information in synchronizing the frame subsets).  

Regarding Claim 6, Hannuksela in view of Sekar discloses the method of claim 5.
Hannuksela discloses further comprising: outputting the first subset of video packets, the second subset of video packets, or both based at least in part on the ([0242], temporal synchronization from reference picture with picture order count).  

Regarding Claim 7, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further wherein generating the first subset of video packets and the second subset of video packets comprises: generating the first subset of video frames over a first time duration; and generating the second subset of video frames over a second time duration, the first time duration at least partially overlapping in time with the second time duration ([0235], only the pictures at the lowest temporal sub-layer, such as the temporal sub-layer forming the first subset that overlaps with the second subset as they are overlap in time).  
  
Regarding Claim 8, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further wherein selecting the mode of operation for the neural processing unit comprises: identifying the change in the reference scene; and selecting a training mode for the neural processing unit based at least in part on the identified change in the reference scene ([0231], the content is analyzed in order to adapt the extent of fine-tuning the neural nets such as utilizing one or more parameters based on scene cut detection).
Sekar also discloses segmentation is based on a frame-by-frame comparison of the input video, with the video being split into many continuous scene segments 301(1) to 301(K) on the time axis according to scene changes ([0078], FIG. 12).
Regarding Claim 9, Hannuksela in view of Sekar discloses the method of claim 8.
Hannuksela discloses further comprising: decoding the first subset of video frames by a video processing unit of the device; and training a learning model associated with the neural processing unit during the training mode based at least in part on at least one decoded video frame of the decoded first subset of video frames  ([0231], encoding adaptively turns learning on or off; and turn the learning off and indicate the decision in or along the bitstream to the decoder).

Regarding Claim 10, Hannuksela in view of Sekar discloses the method of claim 8.
Sekar discloses further comprising: selecting a generation mode for the neural processing unit based at least in part on a frame count satisfying a threshold, the frame count comprising a number of frames following the identified change in the reference scene ([0078], scene specific video segment 301(j) serves as the basic unit where an independent face tracking task is performed such as video scene segmentation service 1106 is implemented using a pre-trained machine learning based system that is generated using a machine learning algorithm and sample data; [0082], FIG. 13)
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of parallel generation of the subsets of video frames Sekar ([0078]) into the video processing system of Hannuksela in order to provide improved user experience and  improved or more 
Regarding Claim 11, Hannuksela in view of Sekar discloses the method of claim 10.
Hannuksela discloses further comprising: decoding the first subset of video frames using a video processing unit of the device; and generating, using the neural processing unit of the device, at least one video frame of the second subset of video frames during the generation mode based at least in part on at least one decoded video frame of the decoded first subset of video frames ([0231], encoder and decoder indicate one or more parameters controlling the effectiveness of the on-line learning to the parameters or weights of the neural based on scene cut detection).

Regarding Claim 12, Hannuksela in view of Sekar discloses the method of claim 10.
Hannuksela discloses further comprising: determining to switch the mode of operation for the neural processing unit from the training mode to the generation mode based at least in part on header information associated with one or more frames of the set of video packets ([0242] Sets of parameters and/or weights depending on the prediction type and/or prediction reference type: A distinct set of parameters and/or weights may be maintained for each type of prediction where NN-based prediction is applied. For example, a different NN-based predictor may be used for intra prediction, temporal inter prediction, inter-view sample prediction, and inter-layer sample prediction for quality/spatial scalability. In some cases, the prediction type may be concluded based on the prediction reference. For example, in HEVC or alike, a reference to a long-term reference picture with picture order count difference equal to 0 compared to that of the current picture, may be inferred to concern an inter-view or inter-layer reference picture. 

Regarding Claim 13, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further comprising: training a learning model associated with the neural processing unit based at least in part on the first subset of video frames processed by a video processing unit of the device ([0011], training multiple models using the same encoded prediction error block as basis for the training signal used for the training; and maintaining competing sets of parameters and/or weights for the models of the neural net).

Regarding Claim 16, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further wherein: batching the set of video frames is performed by a decoder of the device ([0231], encoder and decoder indicate one or more parameters controlling the effectiveness of the on-line learning to the parameters or weights of the neural based on scene cut detection).

Regarding Claim 17, Hannuksela in view of Sekar discloses the method of claim 1.
Hannuksela discloses further comprising: identifying a long-term reference frame of the set of video frames based at least in part on an accuracy threshold, the long-term  ([0141] an indication of a reference picture type, such as a long-term reference picture and an inter-layer reference picture- indicated per reference picture).

Regarding Claim 18, Hannuksela in view of Sekar discloses the method of claim 17.
Hannuksela discloses further wherein the long-term reference frame is identified by an encoder of the device ([0242]; [0245], a convolutional neural net may be used for certain prediction modes, such as intra prediction).
Regarding Claim 19, Apparatus claim 19 of using the corresponding method claimed in claims 1, and the rejections of which are incorporated herein for the same reasons of obviousness as used above. 	
Hannuksela further discloses processor and memory coupled with the processor ([0035]).
Regarding Claim 20, Apparatus claim 20 of using the corresponding method claimed in claims 1, and the rejections of which are incorporated herein for the same reasons of obviousness as used above. 



Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Hannuksela et al. (US 20210195206, hereinafter Hannuksela) in view of Sekar et al. (US 20190289359, hereinafter Sekar) and Ripple et al. (US 20200272903, hereinafter Ripple).

Regarding Claim 14, Hannuksela in view of Sekar discloses the method of claim 1 but does not explicitly disclose further comprising: estimating motion vector information associated with the set of video frames; and identifying the change in the reference scene based at least in part on the motion vector information and a learning model associated with the neural processing unit.
Ripple teaches from the same field of endeavor estimating motion vector information associated with the set of video frames ([0046] frame extractor model 220 includes a set of reference frame generator models R.sub.1, R.sub.2, . . . , R.sub.n, a set of motion flow generator models MF.sub.1, MF.sub.2, . . . , MF.sub.n, a set of optical flow generator models OF.sub.1, OF.sub.2, . . . , OF.sub.n, a weight map generator model WG, and a residual frame generator model RG that perform the one or more operations of the frame extractor model 220); and  identifying the change in the reference scene based at least in part on the motion vector information and a learning model associated with the neural processing unit  ([0046] generating different types of intermediate frames at each step in the encoding process that can be combined or transformed to generate the set of reconstructed frames. In one embodiment, the set of reference frame generator models, the set of motion flow generator models, the set of optical flow generator models, the weight map generator model, and the residual generator model are each configured as a convolutional neural network (CNN).)
Therefore, it would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of estimating motion vector Ripple ([0046]) into the video processing system of Hannuksela & Sakar to provide to adjust the relative amount between the two types of information depending on the content of the target frame while the relative amount of information spent on motion vectors and the residual frame remain relatively constant (Ripple, [0063]).
Regarding Claim 15, Hannuksela in view of Sekar and Ripple discloses the method of claim 14. 	Ripple further discloses determining a difference between first motion vector information associated with a first video frame of the set of video frames and second motion vector information associated with a second video frame of the set of video frames, wherein identifying the change in the reference scene is based at least in part on the determined difference (([0046] frame extractor model 220 includes a set of reference frame generator models R.sub.1, R.sub.2, . . . , R.sub.n, a set of motion flow generator models MF.sub.1, MF.sub.2, . . . , MF.sub.n, a set of optical flow generator models OF.sub.1, OF.sub.2, . . . , OF.sub.n, a weight map generator model WG, and a residual frame generator model RG that perform the one or more operations of the frame extractor model 220)).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samuel D Fereja whose telephone number is (469)295-9243. The examiner can normally be reached 8AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID CZEKAJ can be reached on (571) 272-7327. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, 




/SAMUEL D FEREJA/Examiner, Art Unit 2487