DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Prior arts cited in this office action:
Lu et al. (WO 2018094296 A1, hereinafter “Lu”)
Teng et al., (CN 111368528 A, hereinafter “Teng”)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (WO 2018094296 A1, hereinafter “Lu”) and in view of Teng et al., (CN 111368528 A, hereinafter “Teng”).
Regarding claim 1:
Lu teaches an image caption apparatus (Lu abstract, [0009], where Lu teaches and image caption apparatus and model) comprising: 
an encoder which encodes an input image (Lu [0009], [0012],[0015], [0020], where Lu teaches and encoder which encode an input image); and 
a decoder which receives an output of the encoder (Lu [0020]-[0022], [0256], [0258], figs. 12, 13 and 15, where Lu discloses that the decoder can receive input from the output of the encoder and that the encoder can be an LSTM or a CNN/RNN, etc.). 
(Note: the limitations following the wherein clause has not been given any patentable weight since it only describes what the decoder contains without outputting any result of the decoding or performing any decoding. Nonetheless, for completeness sake the claim limitations are further addressed to give applicant a chance to address them).
wherein the decoder comprises:
a first long short-term memory (LSTM) configured to operate in cooperation with a second LSTM to respectively generate a first hidden vector and a second hidden vector, wherein the second hidden vector is used to generate a word for an output caption (Lu [0020]-[0022], [0038], [0198], [02000], [0256], [0258], figs. 8, 14, 13 and 15, where Lu discloses a LSTM that is configured and can be in cooperation with one or more other LSTM than can used the same cell or be recursively used and corresponding hidden vector is formed and the hidden vector can be used to generate caption words);
a LSTM cell configured to be used for both the first LSTM and the second LSTM to generate the first hidden vector or the second hidden vector, wherein a personality embedding vector fed into the LSTM cell is employed to modulate an input signal of visual and language features of the internal gates of the LSTM cell (Lu [0098], [0198], [0258], [0288], [0304]-[0306], [0312], [0321], [0322], figs. 8, 14, 13 and 15, we  can see that across multiple time steps the LSTM cell is being reused and personality embedding vector is fed into the LSTM to modulate input signal (visual and language)); and
embedding vector at each word generation step before the personality embedding vector is fed into the LSTM cell (Lu [0098], [0198], [0258], [0288], [0304]-[0306], [0312], [0321], [0322], figs. 8, 14, 13 and 15, Lu teaches using embedder to generate embedding vector at each word generation step and the output is fed back to the LSTM).
Lu fails to explicitly teach wherein a personality controller configured to decay the personality embedding vector at each word generation step.
However, Lu teaches determining and unnormalized attention values for the image feature vectors and exponentially normalizing the attention vector to produce the word output that is fed back to the LSTM and determining how much the previous caption word has on the next caption word (Lu [0022]-[0023], [0055], [00176]-[00179]). Teng further teaches determining a self-attention mechanism that includes the parameter weights of the backward LSTM with a normalized exponential function to calculate the next word (Teng [0025]-[00035], [0072]-[0080], claim 4).
Therefore, taking the teachings of Lu and Teng as a whole, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the application to use a decaying function to decay the output word that is served as input to the LSTM, as an obvious option,  in order to control the how much the previous word affect the current word, since decaying function is well-known and provides good control an expected result.
Regarding claim 2:
Lu in view of Teng teaches wherein the first LSTM corresponds with a visual attention model and the second LSTM corresponds with a language model (Lu abstract [0009], [0017], [0041], [0083], [00100]-[00101], [00198], [00209]-[00212], where Lu teaches selecting between visual attention model and language model based one which one is more important).
Regarding claim 3:
Lu in view of Teng teaches wherein the output of the encoder comprises a feature map representing the input image and an average feature vector representing the global information of the input image (Lu [0015], [0055], [0069]-[0075]).
the decoder is further configured to calculate a visual context vector based on the first hidden vector and the feature map, the first LSTM calculates the first hidden vector using a first input vector that is a combination of the average feature vector, a previous second hidden vector and a previously generated word (Lu [0076], [0098], [00105], [0076], [0098], [00105],),
the second LSTM calculates the second hidden vector using a second input vector that is a combination of the first hidden vector and the visual context vector (Lu [0076], [0098], [00105], figs. 12-15), and
the input signal is formed based on the first input vector and a previous first hidden vector when the LSTM cell is used for the first LSTM and the input signal is formed based on the second input vector and the previous second hidden vector when the LSTM cell is used for the second LSTM (Lu [0076], [0098], [00105], [00109], [00118], [00124]-[00128], [0161]-[00162], figs. 12-15; Teng [0025]-[00035], [0072]-[0080], claim 4).
Regarding claim 4:
Lu in view of Teng teaches wherein a feature-wise transformation layer is coupled to each internal gate of the LSTM cell wherein the feature-wise transformation layer comprises a conditional layer normalization which receives the input signal and the personality embedding vector as an input (Lu [0098], [0198], [0258], [0288], [0304]-[0306], [0312], [0321], [0322], figs. 8, 14, 13 and 15; Teng [0025]-[00035], [0072]-[0080], claim 4).
Regarding claim 5:
Lu in view of Teng teaches wherein the personality controller comprises a controller vector to determine the amount of information from a previous personality embedding vector to be kept in the personality embedding vector (Lu [0076], [0098], [00105], [0124]-[0125],  [00169], [00190], [0076], [0098], [00105]).
Regarding 6:
Lu in view of Teng teaches wherein the personality controller comprises a controller vector to determine the amount of information from a previous personality embedding vector to be kept in the personality embedding vector (Lu [0098], [0198], [0258], [0288], [0304]-[0306], [0312], [0321], [0322], figs. 8, 14, 13 and 15; Teng [0025]-[00035], [0072]-[0080], claim 4).
Regarding claim 7:
Lu in view of Teng teaches wherein the controller vector is calculated based on the previous first hidden vector and the previous second hidden vector(Lu [0098], [0198], [0258], [0288], [0304]-[0306], [0312], [0321], [0322], figs. 8, 14, 13 and 15; Teng [0025]-[00035], [0072]-[0080], claim 4).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEDNEL CADEAU whose telephone number is (571)270-7843. The examiner can normally be reached Mon-Fri 9:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chieh Fan can be reached on 571-272-3042. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WEDNEL CADEAU/Primary Examiner, Art Unit 2632                                                                                                                                                                                                        November 25, 2022