DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 21-45 are rejected under 35 U.S.C. 103 as being unpatentable over US 10984758 to Croxford (“Croxford”) in view of US 20210049468 to Karras (“Karras”).
Regarding Claim 21:  “An apparatus, comprising:
a processor circuit; and … a memory storing instructions which when executed by the processor circuit cause the processor circuit to:  (“a neural network may be implemented using a more general processor, such as a CPU or GPU of the display controller 124. The display controller 124 may be implemented using machine readable instructions and suitably programmed or configured hardware, such as circuitry.”  Croxford, Column 13, lines 1-8.  Also see Karras, Paragraph 235.)
generate a first feature map of a first image by a first layer of a neural network, the neural network to encode the first image;  (“generate a series of feature maps using each of the convolutional layer(s) 116. The feature maps for example represent image features present in the first portion 202 of the input frame 200”  Croxford, Column 9, lines 50-53.)
compute, by an attention layer of the neural network based on the first feature map of the first image, an adaptive spatial saliency map for the first feature map of the first image; and  (“the CNN 114 may include additional processing not shown in FIG. 2, such as the processing of a feature map generated by a convolutional layer 116 with a suitable activation feature (sometimes referred to as a rectifier), which is a non-linear function used to map an input to a predefined output, such as a value which is 0 or greater, e.g. the rectified linear unit (ReLU) function.”  This provides non-zero mapping and weighing for important features [adaptive saliency mapping] and zero weighing for other features.  Croxford, Column 9, lines 56-61.  See details particular to “attention” mapping/masking in Karras, Paragraph 221 and statement of motivation below.)
perform an element-wise multiplication of the first feature map and the adaptive spatial saliency map for the first feature map to generate a modulated feature map to (“a suitable activation feature (sometimes referred to as a rectifier), which is a non-linear function used to map an input to a predefined output, such as a value which is 0 or greater, e.g. the rectified linear unit (ReLU) function.”  This provides non-zero mapping and weighing [multiplication] for important features and zero weighing for other features.  Croxford, Column 9, lines 56-61.)
encode the first image.”  (“display controller 124 is configured to compress [encode] the first enhanced data before sending the first enhanced data to the display interface 128 … the display controller 124 includes an encoder 132 configured to perform the compression.”  Croxford, Column 13, lines 23-39.)
The text of Croxford alludes to use of portion masks and further feature masks to identify and weigh important features to be considered by the neural network, per claim language “element-wise multiplication of the first feature map and the adaptive spatial saliency map”, but the description of this feature is not single and clear in describing the masks and the features that they can emphasize in the image.
Karras teaches the above feature in the context of encoding video using neural networks to emphasize the quality of human perceived image features:  “generate an alpha mask (channel) or matte indicating the separate foreground and background portions. … The foreground portion comprises at least the face … the attention of the discriminator neural network 875 may be focused on the semantically critical areas of the face, … cause the synthesis neural network 715 to devote additional capacity to the regions of the reconstructed image that will be most important to a human viewer.”   Karras, Paragraphs 218-221.  
Therefore, before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to supplement the teachings of Croxford to perform the masking steps as taught in Karras, in order to to identify and weigh important portions and features to be considered by the neural network.  
Finally, in reviewing the present application, there does not seem to be objective evidence that the claim limitations are particularly directed to: addressing a particular problem which was recognized but unsolved in the art, producing unexpected results at the level of the ordinary skill in the art, or any other objective indicators of non-obviousness.  
	
Regarding Claim 22:  “The apparatus of claim 21, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: apply, by a SoftMax layer of the neural network, a SoftMax function to the adaptive spatial saliency map for the first feature map prior to performing the elementwise multiplication, the SoftMax function to cause a sum of each value in the adaptive spatial saliency map to equal one.”  (Note that the claim does not provide a limitation for the SoftMax layer or function and Specification does not define this term.  Under the broadest reasonable interpretation consistent with the specification and ordinary skill in the art, The SoftMax function is interpreted to be a normalization function performed by an additional neural network layer.  
Prior art teaches that “a deep learning (DL) ML algorithm may be used, which uses multiple layers” and thus implement multiple functions by the neural network.  Croxford, Column 13, lines 2-5.  Further, “In an embodiment, the first style signal controls an adaptive instance normalization (AdaIN) operation within the first layer 120 of the synthesis network 140,” one. Karras, Paragraphs 37 and 63.  See statement of motivation in Claim 21.)
Regarding Claim 23:  “The apparatus of claim 22, the first image comprising a plurality of channels, the SoftMax function of the SoftMax layer applied to at least one of: (i) each of the plurality of channels collectively, and (ii) each of the plurality of channels individually.”  (The claim is unclear as to the definition of channels in an image.  Under the broadest reasonable interpretation consistent with the specification and ordinary skill in the art, the channels represents sub-sections of the image, such as regions or feature maps.  Prior art teaches an embodiment of this, “where each feature map X, is normalized separately” another words individually.  Karras, Paragraph 63 and statement of motivation in Claim 21.)
Regarding Claim 24:  “The apparatus of claim 21, the neural network comprising a convolutional neural network, the adaptive spatial saliency map 
to cause the convolutional neural network to allocate a first number of bits … the first number of bits greater than a second number of bits allocated to the first portion of the first image by the neural network without the attention layer.”  (Under the broadest reasonable interpretation consistent with the specification and ordinary skill in the art, number of bits in an image can encode resolution, dynamic range, or color range.  Prior art teaches this:  “enhancing at least one characteristic of the input frame, such as a resolution, dynan1ic range and/or color range of the input frame” Croxford, Column 2, lines 56-58.)
when encoding a first portion of the first image, the first portion of the first image depicting a visual element associated with a human visual system (HVS), the element comprising at least one of: (i) a face, (ii) a high contrast region, (iii) a human, and (iv) an object, (For example “The feature maps for example represent image features present in the first portion 202 of the input frame 200 such as corners or lines,” another words high-contrast regions.  Croxford, Column 9, lines 50-53.  Also, “For example, if the first portion of the input frame includes an image of a cat [object], the DL system may generate pixel values that accurately replicate the texture of cat fur [high contrast region].”  Croxford, Column 3, lines 11-14.  Also note that the enhanced image objects can be humans and faces in Karras, Paragraph 4 and statement of motivation in Claim 1.)
Regarding Claim 25:  “The apparatus of claim 21, 
the first image of a video comprising at least the first image and a second image, the second image adjacent and prior to the first image in the video, the neural network to encode the video according to at least one encoding format, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: … concatenate the adaptive spatial saliency map for the first image and an adaptive spatial saliency map for the second image; and  (Under the broadest reasonable interpretation consistent with the specification and ordinary skill in the art, concatenation of the saliency maps of the current and the previous image, can provide a map of image portions that can be reused from the previously coded image and identify image portions that need to be further coded in the present image.  
Prior art provides examples of this:  “In some cases in which at least one portion of a given frame is the same as ( or relatively similar to) a corresponding portion of a previous frame, processing of the at least one portion of the given frame (and sending of data representative of the at least one portion of the given frame to the display device) may be omitted, to reduce resource usage. In such cases, the corresponding portion of the previous frame may be used in place of the at least one so portion of the given frame, during display of the given frame. … Data representative of the at least one further portion of the input frame 100 may be processed to generate the further data, … For example, such further processing may include processing of at least one further portion of the input frame using the image enhancement scheme.”  Croxford, Column 5, lines 43-66.)
compute, by a second layer of the neural network, a spatial and temporal saliency map for the first image based on the concatenation of the adaptive spatial saliency map for the first image and the adaptive spatial saliency map for the second image.”  (“The feature maps for example represent image features present in the first portion 202 of the input frame 200”   Croxford, Column 5, lines 43-66.  And, as noted above, features that are reused from the previous frame can be omitted.)
Claim 26 is rejected for reasons stated for Claim 22 in view of Claim 25 rejection.
Claim 27 is rejected for reasons stated for Claims 21 and 23 in view of Claim 26 rejection.
Claim 28 is rejected for reasons stated for Claim 24 in view of Claim 27 rejection.
Regarding Claim 29:  “The apparatus of claim 21, wherein the neural network is to encode the image according to at least one encoding format, the memory storing instructions which when executed by the processor circuit cause the processor circuit to: apply a batch normalization process to the modulated feature map.”  (“The style control signals for different layers of the synthesis network may be generated from the same [batch] or different latent codes.”  .”  Karras, Paragraph 32.  “the first style signal controls an adaptive instance normalization (AdaIN) operation within the first layer 120 of the synthesis network 140.”  Karras, Paragraph 37.  See statement of motivation in Claim 21.)
Regarding Claim 30:  “The apparatus of claim 21, the memory storing instructions to train the neural network which when executed by the processor circuit cause the processor circuit to:
configure an output of each attention layer in the neural network to equal one;  (“The style-based generator system 100 and discriminator 275 are trained simultaneously using a training dataset that includes example output data that the output data produced by the style-based generator system 100 should be consistent with. … instance of training data contains a correct answer ( e.g., classification),”  Karras, Paragraph 77 and related training discussion in Paragraphs 164-165.  
Although Karras does not teach that a mask element has a value of one (or zero), under the broadest reasonable interpretation consistent with the specification and ordinary skill in the art, the claim creates a mask (ones and zeros) differentiating the background part of the image from the more important objects in the image for purposes of training.  Karras teaches this functionality:  “The synthesis neural network 715 may also be configured to generate an alpha mask (channel) or matte indicating the separate foreground and background portions. … During training, the foreground portion and alpha mask for each reconstructed image may be shifted …” Karras, Paragraphs 218-219.  See statement of motivation in Claim 1.)
pretrain weights of the neural network where gradients flow into routes that do not include the attention layers during backward propagation; and  (“Parameters (e.g., weights) of the mapping neural network 110 are learned during training and the parameters are used to process the input latent codes when the style-based generator system 100 is deployed to generate the output data.”  Karras, Paragraph 56 and statement of motivation in Claim 21.)
refine the neural network by retraining all pretrained weights and weights in each attention layer such that gradients flow into all routes of the neural network until convergence is reached.”  (“The weights (which for example correspond with respective elements of kernels associated with layers of the NN) may be adjusted throughout training, altering the output of individual neurons and hence of the NN as a whole.”  Croxford, Column 9, lines 38-42.)
Claim 31, “A non-transitory computer-readable storage medium comprising instructions that when executed by a processor of a computing device, cause the processor to: …,” is rejected for reasons stated for Claim 21, and because prior art teaches:  “a neural network may be implemented using a more general processor, such as a CPU or GPU of the display controller 124. The display controller 124 may be implemented using machine readable instructions and suitably programmed or configured hardware, such as circuitry.”  Croxford, Column 13, lines 1-8.)
Claims 32-40 are rejected for reasons stated for Claims 22-30 respectively in view of the Claim 31 rejection.
Claim 41, “A method,” is rejected for reasons stated for Claim 21, because the processor executed steps of Claim 21 perform the method steps of Claim 41.)
Claims 42-45 are rejected for reasons stated for Claims 22-35 respectively in view of the Claim 41 rejection.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIKHAIL ITSKOVICH whose telephone number is (571)270-7940. The examiner can normally be reached Mon. - Thu. 9am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Joseph Ustaris can be reached on (571)272-7383. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MIKHAIL ITSKOVICH/Primary Examiner, Art Unit 2483