DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites (claim 5, line 1) “The method of any one of claims 4”, suggesting a multiple dependent claim but listing only a single parent claim. Given that the present claims have been amended to eliminate various multiple dependencies, Examiner infers that the intent is that claim 5 depends from claim 4.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 7-8, & 11-12 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chen (US 20140071347, cited in 4/23/21 Information Disclosure Statement).
Claim 1: A method comprising:
receiving: (i) one or more reference video frames (Chen paragraph 0086, key frames), (ii) respective reference labels for each of a plurality of reference pixels in the reference video frames (Chen paragraphs 0082 & 0085, pixel color labels), and (iii) a target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation);
processing the reference video frames and the target video frame using a colorization machine learning model to generate respective pixel similarity measures (Chen paragraph 0070, LLE model calculating relationship reflecting pixel similarity) between each of (i) a plurality of target pixels in the target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation), and (ii) the reference (Chen paragraph 0086, applying editing to key frames),
wherein the colorization machine learning model is trained to generate pixel similarity measures wherein a respective estimated color of each of target pixel in the target video frame is defined by combining: (i) actual colors of each of the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures); and
determining a respective target label for each target pixel in the target video frame (Chen paragraph 0103, propagating user specified (i.e. target) color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation, comprising:
combining (i) the reference labels for the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures; Chen paragraphs 0082 & 0085, pixel color labels).
Claim 2: The method of claim 1 (see above), wherein the reference pixels in the reference video frames comprise a proper subset of the pixels in the reference video frames (Chen paragraphs 0085-0086, “some” objects, regions, or pixels, indicating a proper (i.e. less than the whole) subset).
Claim 3: The method of claim 1 (see above), wherein the reference video frames and the target video frames are decolorized prior to being processed by the colorization (Chen paragraph 0085, gray image colorization (i.e. video frames prior to processing are decolorized).
Claim 7: The method of claim 1 (see above), wherein a label for a pixel comprises data indicating, for each of multiple possible categories, a respective likelihood that the pixel corresponds to the category (Chen paragraph 0082, label colors for some pixels, indicating that some pixels are certainly part of the color category and others are certainly not part of the color category).
Claim 8: The method of claim 1 (see above), wherein a label for a pixel comprises data indicating, for each of multiple possible key points, a respective likelihood that the pixel corresponds to the key point (Chen paragraph 0082, label colors for some pixels, indicating that some pixels are certainly among key points to be labelled).
Claim 11: The method of claim 1 (see above), wherein the reference labels for the reference pixels in one or more of the reference video frames were previously determined using the colorization machine learning model (Chen paragraphs 0074-0075, pixel reconstruction based on model).
Claim 12: The method of claim 1 (see above), further comprising using the target labels to track a position of an object in the reference video frames to the target video frame (Chen paragraph 0084, specifying spatial position of object to which user specified (i.e. target) color is applicable).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 13-16 & 20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Chen in view of Toklu (US 6549643).
With respect to claim 13, Chen discloses the described operations (which correspond to the operations set forth in the method of claim 1, see above).
Chen does not expressly disclose a data processing apparatus and a memory in communication therewith for performing these operations.
Toklu discloses (Toklu column 5, lines 8-17) the implementation of an image processing method using data processing hardware and a memory device in communication therewith.
Chen and Toklu are combinable because they are from the field of image and video processing.
Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art to 
The suggestion/motivation for doing so would have been to implement the Chen method using known standard hardware.
Therefore, it would have been obvious to combine Chen with Toklu to obtain the invention as specified in claim 13.
Claim 13: A system, comprising:
a data processing apparatus (Toklu column 5, lines 8-12, processing hardware); and
a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform operations (Toklu column 5, lines 12-17, software stored in memory for controlling the processing hardware) comprising:
receiving: (i) one or more reference video frames (Chen paragraph 0086, key frames), (ii) respective reference labels for each of a plurality of reference pixels in the reference video frames (Chen paragraphs 0082 & 0085, pixel color labels), and (iii) a target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation);
processing the reference video frames and the target video frame using a colorization machine learning model to generate respective pixel similarity measures (Chen paragraph 0070, LLE model calculating relationship reflecting pixel similarity) between each of (i) a plurality of target pixels in the target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation), and (ii) the reference pixels in the reference video frames (Chen paragraph 0086, applying editing to key frames),
wherein the colorization machine learning model is trained to generate pixel similarity measures wherein a respective estimated color of each of target pixel in the target video frame is defined by combining: (i) actual colors of each of the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures); and
determining a respective target label for each target pixel in the target video frame (Chen paragraph 0103, propagating user specified (i.e. target) color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation, comprising:
combining (i) the reference labels for the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures; Chen paragraphs 0082 & 0085, pixel color labels).
Applying these teachings as applied to claim 13 above to claims 14-16 & 20:
Claim 14: One or more non-transitory computer storage media storing instructions (Toklu column 5, lines 12-17 that when executed by one or more computers (Toklu column 5, lines 8-12, processing hardware) cause the one or more computers to perform operations comprising:
receiving: (i) one or more reference video frames (Chen paragraph 0086, key frames), (ii) respective reference labels for each of a plurality of reference pixels in the reference video frames (Chen paragraphs 0082 & 0085, pixel color labels), and (iii) a target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation);
processing the reference video frames and the target video frame using a colorization machine learning model to generate respective pixel similarity measures (Chen paragraph 0070, LLE model calculating relationship reflecting pixel similarity) between each of (i) a plurality of target pixels in the target video frame (Chen paragraph 0103, propagating color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation), and (ii) the reference pixels in the reference video frames (Chen paragraph 0086, applying editing to key frames),
wherein the colorization machine learning model is trained to generate pixel similarity measures wherein a respective estimated color of each of target pixel in the target video frame is defined by combining: (i) actual colors of each of the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures); and
determining a respective target label for each target pixel in the target video frame (Chen paragraph 0103, propagating user specified (i.e. target) color to other pixels in a video, other pixels in a video comprising frame(s) readable as target frames for propagation, comprising:
combining (i) the reference labels for the reference pixels in the reference video frames, and (ii) the pixel similarity measures (Chen paragraphs 0074-0075, pixel reconstruction based on colors of pixels and sum of squared pixel difference measures; Chen paragraphs 0082 & 0085, pixel color labels).
Claim 15: The non-transitory computer storage media of claim 14 (see above), wherein the reference pixels in the reference video frames comprise a proper subset of the pixels in the reference video frames (Chen paragraphs 0085-0086, “some” objects, regions, or pixels, indicating a proper (i.e. less than the whole) subset).
Claim 16: The non-transitory computer storage media of claim 14 (see above), wherein the reference video frames and the target video frames are decolorized prior to being processed by the colorization machine learning model (Chen paragraph 0085, gray image colorization (i.e. video frames prior to processing are decolorized).
Claim 20: The non-transitory computer storage media of claim 14 (see above), wherein a label for a pixel comprises data indicating, for each of multiple possible categories, a respective likelihood that the pixel corresponds to the category (Chen paragraph 0082, label colors for some pixels, indicating that some pixels are certainly part of the color category and others are certainly not part of the color category).
Allowable Subject Matter
Claims 4-6, 9-10, & 17-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
With respect to claims 4 & 17 (and dependent claims 5-6, 9-10, & 18-19), the art of record does not teach or suggest the recited arrangement of an embedding neural network in which the reference video frames and target video frames are provided as neural network input, this input is processed in accordance with current parameters to generate a respective embedding in each of the target pixels and reference pixels, and pixel similarities are generated between target pixels and reference pixels using the embeddings in conjunction with the recited arrangement of colorization processing of a reference video frame and a target video frame such that pixel colors are determined by combining actual reference pixel colors and similarity measures and combining reference labels and similarity measures.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Liou (US 20020147834) and Takahashi disclose examples of detecting pixel color similarities.
Norton, Levin, and Liao (US 20210201071) disclose examples of colorization.
Any inquiry concerning the contents of this communication or earlier communications from the examiner should be directed to Stephen M. Brinich at 571-272-7430 (voice) or 571-273-7430 (fax).
Any inquiry relating to the status of this application, entry of papers into this application, or other any inquiries of a general nature concerning application processing should be directed to the Tech Center 2600 Customer Service center at 571-272-2600 or to the USPTO Contact Center at 800-786-9199 or 571-272-1000.
The examiner can normally be reached on weekdays 7:30-4:00 Eastern Time.
If attempts to contact the examiner and the Customer Service Center are unsuccessful, supervisor Claire Wang can be contacted at 571-270-1051.

/Stephen M Brinich/
Examiner, Art Unit 2663