Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 currently pending. 
IDS filed on 10/19/2021 has been received and entered.

Examiner Notes
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  However, the claimed subject matter, not the specification, is the measure of the invention. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 7-9, and 12-14 is/are rejected under 35 U.S.C. 102 as anticipated by or, in the alternative, under 35 U.S.C. 103 as obvious over Cohen  et al. (US 2019/0196698, A1).

    Regarding claim 1, Cohen teaches in at least Fig. 1 a system comprising: one or more processors (Fig. 1, processors 124); and a non-transitory computer-readable medium communicatively coupled to the one or more processors and storing program code executable by the one or more processors, the program code implementing a natural language-based image editor (at least a readable medium 126 of Fig. 1 storing said readable medium and implementing said program code implementing a natural language-based image editor 120) comprising: 
an operation model configured to infer an image editing operation from a natural language request associated with a source image (receiving in at least para. 0020-0023 and  0047-0048 and Fig. 2 a plurality of natural language requests associated with a source image, said requests received in a case by model 144 indicative of an operation classifier model for understoodly inferring and classifying at least requested operations from the input data which is obviously configured to infer an image editing operation from a natural language request associated with a source image);
a model configured to (a) locate an object or region of the source image that is inferred to correspond to the image editing operation (locate at least in para. 0047-0048 and 0055 and Fig. 5 an object or region of the source image that is inferred to correspond to the image editing operation by model 146); and (b) generate an image mask for the object or region (generate in at least para. 0021 an image mask for the object or region); and 
an operation modular network configured to generate a modified source image by performing the image editing operation (further in Fig. 5-6 and para. 0040 by a module or a component of the model 110 comprising understoodly said operation modular network to generate, S616-618, a modified source image by performing said image editing operation).
    Cohen teaches the claimed invention in at least Figs. 2 and 5-7 except for specifically claiming inferring using an operation classifier model and using a grounding model for generating said mask, It would have been obvious to one having ordinary skill in the art to have substituted a model of Cohen to a substitute one, since it has been held to be within the general skill of a worker in the art to select a known material or said specific model on the basis of its suitability for the intended use as a matter of obvious design choice, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

     Regarding claim 2 (according to claim 1), Cohen further teaches wherein the operation classifier model comprises: a first neural network configured to encode the source image (Figs. 2 and 5-6 further teaches receiving at said neural networks embedded or encoded source images); 
a second neural network configured to encode the natural language request (Figs. 2 and 5-6 further teaches receiving at said neural networks comprising understoodly said second neural network embedded or encoded natural language requests); 
and an operation classifier configured to infer the image editing operation in response to receiving the encoded source image and the encoded natural language request as an input (the system further in Figs. 2 and 5-6 further implied ascertained requested operation by obviously infer the image editing operation in response to receiving the encoded source image and the encoded natural language request as an input).

   Regarding claim 7 (according to claim 1), Cohen  further teaches wherein the natural language request is generated based on audio data representing a natural language expression spoken by a user (Fig. 2 further illustrates the natural language request is generated based on audio data representing a natural language expression spoken by a user).

    Regarding claim 8, Cohen teaches in at least Fig. 1 a computer-implemented method comprising: 
retrieving a source image and a natural language request (the system of Figs. 2, 5-6 and at least para. 0020-0023 and  0047-0048 retrieved source images and inferred natural language request); 
inferring, using an operation model, an image editing operation from the natural language request (ascertaining and inferring further at least para. 0020-0023 and  0047-0048 using an operation model 144, an image editing operation from the natural language request);
generating, using a model, an image mask for an object or region of the source image that is inferred to correspond to the image editing operation (locate at least in para. 0047-0048 and 0055 and Fig. 5 an object or region of the source image that is inferred to correspond to the image editing operation by model 146 and generate further in at least para. 0021 an image mask for the object or region); and performing, using a operation modular network, the image editing operation on the source image (further in Fig. 5-6 and para. 0040 performing by or using a module or a component of the model 110 comprising said operation modular network, the image editing operation on the source image);
an29US2008 11817340 1Docket No. 058083/1194037 (P9576-US) DRAFT PATENT APPLICATIONoutputting a modified source image (displaying further  Fig. 5-6 said outputting modified source image),
    Cohen teaches the claimed invention in at least Figs. 2 and 5-7 except for specifically claiming inferring using an operation classifier model and using a grounding model for generating said mask, It would have been obvious to one having ordinary skill in the art to have substituted a model of Cohen to a substitute one, since it has been held to be within the general skill of a worker in the art to select a known material or said specific model on the basis of its suitability for the intended use as a matter of obvious design choice, according to further known means and methods to yield predictable results since known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art as said combination is thus the adaptation of an old idea or invention using newer technology that is either commonly available and understood in the art thereby a variation on already known art (See MPEP 2143, KSR Exemplary Rationale F).

   Regarding claim 9 (according to claim 8), Cohen further teaches wherein inferring the image editing operation from the natural language request further comprises: encoding the source image using a first neural network (Figs. 2 and 5-6 further implied receiving at a neural network comprising understoodly said first neural network embedded or encoded source images); 
 encoding the natural language request using a second neural network (Figs. 2 and 5-6 further implied receiving at the neural network comprising understoodly said second neural network embedded or encoded inferred natural language requests);
and inferring, using an operation classifier, the image editing operation from the natural language request in response to receiving the encoded source image and the encoded natural language request as an input (the system further in Figs. 2 and 5-6 further implied ascertained requested operation by obviously infer the image editing operation in response to receiving the encoded source image and the encoded natural language request as an input).

    Regarding claim 12 (according to claim 8), Cohen further teaches wherein performing the image editing operation further comprises: 30US2008 11817340 1Docket No. 058083/1194037 (P9576-US) DRAFT PATENT APPLICATIONinferring one or more parameters used for performing the image editing operation (selection of filters or other attributes of further para. 0140 are inferred  for performing the image editing operation).

    Regarding claim 13 (according to claim 12), Cohen further teaches wherein further comprising: generating a modified source image in response to receiving the source image, the natural language request, and the generated image mask (Figs. 5-7, remove requested content and harmonize the image contents indicating further said generating  modified source image in response to receiving the source image, the natural language request, and the generated image mask); 
and wherein the modified source image is modified according to the image editing operation (Figs. 2, and 5-7).  

    Regarding claim 14 (according to claim 8), Cohen further teaches wherein the natural language request is generated based on audio data representing a natural language expression spoken by a user (user conversations of at least Figs. 2 and 5-7) . 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 15-16, and 19-20 is/are rejected under 35 U.S.C. 102 as anticipated by Cohen et al. 

    Regarding claim 15, Cohen teaches in at least para. 0189-0190 a computer-program product tangibly embodied in a non-transitory machine- readable storage medium, including instructions configured to cause a processing apparatus to perform operations including: 
retrieving a source image and a natural language request (Figs. 2, and 5-7 displayed at least retrieved source images  and a user natural language request); 
a step for editing the source image based on an image editing operation inferred from the natural language request (the editing operations of further Figs. 2, and 5-7); and outputting a modified source image (the edited and harmonized source image further display or outputted in Figs. 2, and 5-7).

   Regarding claim 16 (according to claim 15), Cohen further teaches wherein the step for editing the source image further comprises: inferring in at least Figs. 2 and 5-7 the image editing operation from the natural language request by: encoding the source image using a first neural network (Figs. 2 and 5-6 further implied receiving at a neural network comprising understoodly said first neural network embedded or encoded source images); 
encoding the natural language request using a second neural network (Figs. 2 and 5-6 further implied receiving at the neural network comprising understoodly said second neural network embedded or encoded inferred natural language requests);
and inferring the image editing operation from the natural language request in response to receiving the encoded source image and the encoded natural language request as an input (the system further in Figs. 2 and 5-6 further implied ascertained requested operation by obviously infer the image editing operation in response to receiving the encoded source image and the encoded natural language request as an input).

    Regarding claim 19 (according to claim 15), Cohen further teaches wherein the operations further comprise: performing the image editing operation by inferring one or more parameters used for performing the image editing operation (selection of filters or other attributes of further para. 0140 are inferred  for performing the image editing operation).

    Regarding claim 20 (according to claim 15), Cohen further teaches wherein the operation of outputting the modified source image further comprises: generating the modified source image in response to receiving the source image, the natural language request, and the generated image mask (Figs. 5-7, remove requested content and harmonize the image contents indicating further said generating  modified source image in response to receiving the source image, the natural language request, and the generated image mask); 
and wherein the modified source image is modified according to the image editing operation (Figs. 2, and 5-7).  

Claims Standings
Claims 3-6, 10-11, 17-18 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior arts do not spear to teach: claim 3. The system of claim 1, wherein the grounding model further comprises: an attention layer configured to classify the image editing operation as a local operator or a global operator, wherein the local operator applies to a local area of the source image, and wherein the global operator applies to an entirety of the source image; and a language attention network configured to ground the image editing operation to the object or region within the source image when the image editing operation is classified as a local operator.  
4. The system of claim 3, wherein the language attention network further comprises: a subject module configured to generate a subject attention weight indicating a relevance between the object or region of the source image and a subject depicted in the source image;28 US2008 11817340 1Docket No. 058083/1194037 (P9576-US) DRAFT PATENT APPLICATIONa location module configured to generate a location attention weight indicating a relevance between the object or region of the source image and a location of the subject depicted in the source image; a relationship module configured to generate a relationship attention weight indicating a relevance between the object or region of the source image and another object depicted within the source image; and an operation attention module configured to locate the object or region within the source image by modifying the subject attention weight, the location attention weight, and the relationship attention weight using an operation attention weight.  
5. The system of claim 1, wherein the operation modular network comprises: a submodule for the image editing operation inferred from the natural language request, the submodule configured to infer one or more parameters used for performing the image editing operation.  
6. The system of claim 5, wherein the submodule for the image editing operation includes a differentiable filter that modifies the source image based on the source image, the natural language request, and the generated image mask, and wherein the modified source image is modified according to the image editing operation.
10. The computer-implemented method of claim 8, wherein generating the image mask further comprises: classifying, using an operation region model, the image editing operation as a local operator or a global operator, wherein the local operator applies to a local area of the source image, and wherein the global operator applies to an entirety of the source image; and grounding, using a language attention network, the image editing operation to the object or region within the source image when the image editing operation is classified as a local operator.  
11. The computer-implemented method of claim 10, further comprising: generating a subject attention weight indicating a relevance between the object or region of the source image and a subject depicted in the source image; generating a location attention weight indicating a relevance between the object or region of the source image and a location of the subject depicted in the source image; generating a relationship attention weight indicating a relevance between the object or region of the source image and another object depicted within the source image; and locating the object or region within the source image by modifying the subject attention weight, the location attention weight, and the relationship attention weight using an operation attention weight.
17. The non-transitory machine-readable storage medium of claim 15, wherein the step for editing the source image further comprises: generating an image mask by: 31 US2008 11817340 1Docket No. 058083/1194037 (P9576-US) DRAFT PATENT APPLICATION classifying the image editing operation as a local operator or a global operator, wherein the local operator applies to a local area of the source image, and wherein the global operator applies to an entirety of the source image; and grounding the image editing operation to an object or region within the source image when the image editing operation is classified as a local operator.  
18. The non-transitory machine-readable storage medium of claim 17, wherein the step for editing the source image further comprises: generating a subject attention weight indicating a relevance between the object or region of the source image and a subject depicted in the source image; generating a location attention weight indicating a relevance between the object or region of the source image and a location of the subject depicted in the source image; generating a relationship attention weight indicating a relevance between the object or region of the source image and another object depicted within the source image; and locating the object or region within the source image by modifying the subject attention weight, the location attention weight, and the relationship attention weight using an operation attention weight.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCELLUS AUGUSTIN whose telephone number is (571)270-3384. The examiner can normally be reached 9 AM- 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BENNY TIEU can be reached on 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MARCELLUS J AUGUSTIN/Primary Examiner, Art Unit 2674                                                                                                                                                                                                        08/08/2022