DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to applicant’s amendment/arguments filed on 05/03/2022. This action is made FINAL.

Allowable Subject Matter
Claims 1-17 allowed.
Response to Arguments
Applicant’s arguments with respect to claims 18-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (WO 2017112813 A1).

Claim 18. Wang et al. disclose an apparatus for image processing (read as  a device or system that includes a multi-modal virtual personal assistant 150 [0052]. FIG. 1), comprising: 
a memory (memory [0214]); and 
at least one processor, wherein the at least one processor is configured to process instructions stored in the memory (read as A computer-readable medium and computer-program products may have stored thereon code and/ or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements [0531]) to cause: 
an image encoder to generate an image feature vector based on an image frame (read as The coding processor 2130 can analyze and encode iris information from the ins image generated by the pre-processor 2110 [0287]); 
a language encoder to produce an expression embedding based on a referral expression (read as These audio and image analysis tools can analyze audio and visual input, respectively, to understand and interpret (and, in some cases, also, reason) a particular aspect of the input [0196]); 
a cross-attention module to generate a cross-attention vector based on the image frame and the expression embedding (read as These audio and image analysis tools can analyze audio and visual input, respectively, to understand and interpret (and, in some cases, also, reason) a particular aspect of the input [0196]. Analyzing an image with the audio input can be a cross-attention function.); 
a memory encoder to generate a memory feature vector based on a memory image frame (read as The coding processor 2130 can analyze and encode iris information from the ins image generated by the pre-processor 2110 [0287]) and a memory image mask (read The object and gesture models 436 can also be trained or programmed with domain- specific object and gesture samples. For example, the object and gesture models 436 can be trained or programmed with images of domain-specific objects and video sequences of domain- specific gestures. For example, when the domain is related to baseball, the object and gesture models 436 can be trained or programmed to recognize umpires' calls, and to distinguish one team's uniform from another [0122]); 
a memory attention module to generate a memory attention vector based on the memory feature vector and a first output of the image encoder (read The object and gesture models 436 can also be trained or programmed with domain- specific object and gesture samples. For example, the object and gesture models 436 can be trained or programmed with images of domain-specific objects and video sequences of domain- specific gestures. For example, when the domain is related to baseball, the object and gesture models 436 can be trained or programmed to recognize umpires' calls, and to distinguish one team's uniform from another [0122]); and 
a decoder to generate an image mask based on the image feature vector, the cross-attention vector (read as These audio and image analysis tools can analyze audio and visual input, respectively, to understand and interpret (and, in some cases, also, reason) a particular aspect of the input [0196]. Analyzing an image with the audio input can be a cross-attention function.), and the memory attention vector (FIG. 13 shows as system which combines audio descriptions, image and knowledge stored in memory (trained neural network).).
Combined teaching from different embodiments were used in the rejection. 
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Wang et al. in order to realize all limitations of the claimed invention namely the idea of train a neural network to use natural language to describe elements included in a video. The motivation is related to realizing a system and method capable of providing conversational responses in multiple languages associated with items in a video.

Claim 19. The apparatus of claim 18, Wang et al. disclose,
wherein: 
the image encoder comprises a first intermediate stage configured to provide first feature information to the decoder and a second intermediate stage configured to provide second feature information to the decoder (read The object and gesture models 436 can also be trained or programmed with domain- specific object and gesture samples. For example, the object and gesture models 436 can be trained or programmed with images of domain-specific objects and video sequences of domain- specific gestures. For example, when the domain is related to baseball, the object and gesture models 436 can be trained or programmed to recognize umpires' calls, and to distinguish one team's uniform from another [0122]).

Claim 20. The apparatus of claim 18, Wang et al. disclose,
wherein: 
the decoder comprises a first refinement stage configured to receive the cross-attention vector and a second refinement stage configured to receive the memory feature vector (FIG. 13 shows as system which combines audio descriptions, image and knowledge stored in memory (trained neural network).).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED RACHEDINE whose telephone number is (571)272-9249. The examiner can normally be reached Mon-Fri 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lester Kincaid can be reached on (571)272-7922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MOHAMMED RACHEDINE
Examiner
Art Unit 2649



/MOHAMMED RACHEDINE/           Primary Examiner, Art Unit 2646