Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s response to the last office action, filed June 14, 2022 has been entered and made of record. Claims 1, 8, and 15 have been amended. Claims 1-20 are pending in this application.
In view of Applicant’s amendment, the rejection of claims 1-20 under 35 U.S.C. 112(b), has been withdrawn.

Response to Arguments
Applicant's arguments filed 06/14/2022 have been fully considered but they are not persuasive. 
-- Applicant asserted, (Page 10, 1st paragraph), that none of the cited portions of 
Baker captions a photograph in a narrative style particular to a specific user. 
	The Examiner respectfully disagrees, because Baker clearly discloses that the evaluator 330 can evaluate the sentences obtained from the set of sentence generation modules 328 to select a sentence to autocaption with the image 314, [i.e., generating caption for the visual media], by comparing the sentences to one another relative to one or more parameters to rank the sentences, which the one or more parameters relate to sentence style (e.g., writing style), sentence detail and/or repetitiveness, user preferences, and/or user feedback, among others, [i.e., user's personal narrative style], (see at least: col. 8, lines 56-67).
	For reasons stated above, the rejection of claim 1 was proper and it is maintained.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 8-9, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, (US-PGPUB 2017/0177623) in view of Baker (US Patent 9,317,531); and further in view of Mishra et al (US Patent 9,972,309); and further in view of Weiner, (US-PGPUB 2014/0039893)



In regards to claim 1, Chen et al discloses a method, comprising: 
accessing visual media, (405 in Fig. 4, and Para 0035, a social media post is 
received at 405, [which enables accessing the social media]); 
analyzing information associated with the visual media to identify a context of the visual media, (415 in Fig. 4, and Para 0036, collecting the reference data which could be used to identify activities associated with the social media post, “e.g., location category”, [i.e., context of the visual media]. Further, in Para 0039, discloses that the content of the social media post may be analyzed to determine a location, [i.e., context of the visual media], where the social media post was authored or captured, using for example, one or more of text recognition, voice recognition, object recognition, or any other recognition techniques);  
providing the context to the personalized language model; and generating a caption for the visual media using the personalized language model and the context, (steps 410, 415, 420, 425, and Paragraphs 0040-0042, setting parameters of a language model to be used for caption generation based on inferred topics based on implicitly on social media that is determined to be associated with location information on step 420, [i.e., the location information corresponds to the context]. Further, step 430 in Fig. 4, and Para 0046, discloses the generating a caption based on the reference data relevant to determining common activities occurring at a location category and the inferred topics using the identified language model, [i.e., generating a caption for the visual media using the personalized language model and the context, based on the provided reference data or the context of the visual media to the language model]).

Chen et al does not expressly disclose receiving a request to generate a 
personalized language model configured to reflect a personal narrative style of the user; retrieving public-facing language associated with the first user; using the public-facing language to build the personalized language model, the personalized language model reflecting a vocabulary, a sentence or phrase structure, and a sentence construction specific to the user; and that the caption for the visual media is generated in the user's personal narrative style.
 	However, Mishra et al discloses generating a personalized language model 
configured to reflect a personal narrative style of the user, (see at least: col. 2, lines 1-10, the personalized natural language generation model provides communication style which can be specific to the identified user. For example, the personalized natural language generation model can provide vocabulary specific to the user's profession, “narrative style of the user”, location, place of education, place of birth, ethnicity, socio-economic class, or other demographic data, as well as providing speech with a prosody, accent, “narrative style of the user”, or other sound variation corresponding to the identified user, [i.e., implicitly generating a personalized language model configured to reflect a personal narrative style of the user]. Mishra et al further discloses retrieving public-facing language associated with the user; and using the public-facing language to build the personalized language model, the personalized language model reflecting a vocabulary, a sentence or phrase structure, (see at least: Fig. 3, and col. 10, lines 11-42, the quotations can be retrieved from resources such as social media 306, novels 308 and literary narratives, television scripts 310, blogs 312, and articles 314, such as newspaper and other websites. The quotations sought for can be specific to a demographic or demographics of a user, be defined by a specific language or accent, age, geography, education level, socio-economic status, or use other characteristics of the user, [i.e., retrieving public-facing language associated with the user]. Further, the server 302, such as the system 206 of FIG. 2, builds a personalized natural language generation model by searching for quotations across a network or database such as the Internet 304, [i.e., using the public-facing language to build the personalized language model]. Mishra et al further discloses in col. 1, lines 63-67, discloses that the personalized natural language generation model can provide vocabulary specific to the user's profession, location, place of education, place of birth, ethnicity, socio-economic class, or other demographic data, corresponding to the identified user, [i.e., the personalized language model implicitly reflects a vocabulary specific to the user]. Further, col. 5, lines 25-33, the system producing the personalized natural language generation model can count: periods, commas, semicolons, exclamation marks, 1st person singular pronouns, 1st person plural pronouns, combined 1st person singular and plural pronouns, negative particles, numbers, prepositions, pronouns, question marks, words longer than six letters, total quotes, 2nd person singular pronouns, positive words, negative words, nouns, verbs, adjectives, and/or adverbs, [i.e., the personalized language model implicitly reflects a sentence or phrase structure]. Furthermore, Mishra et al discloses in col. 6, lines 14-23, performing a syntactic analysis, which can use automatic semantic processing of the collected corpus of quotations/utterances to automatically generate semantic representations for each utterance in the corpus, such that the style can be separated from content, then used for modeling personalized natural language generation in spoken systems. For example, if a user is a teenager, the model uses a style and vocabulary of a teenager category to generate responses for the teenage user. A different style and vocabulary will be used to address an elderly person, [i.e., the personalized language model, implicitly reflect a sentence construction specific to the user based on using style and vocabulary for different users]).
Chen and Mishra et al are combinable because they are both concerned with visual media description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify Chen, to use the server for searching quotations across a network or database such as the Internet , as though by Mishra et al, in order to build a personalized natural language generation model, (Mishra, col. 10, lines 17-19). 
Although, Chen discloses generating a caption for the visual media using the personalized language model and the context; the combine teaching Chen, and Mishra as whole does not expressly disclose that the caption for the visual media is being generated in the user's personal narrative style; and the receiving a request to generate a personalized language model.
Baker discloses the generating caption for the visual media in the user's personal narrative style, (see at least: col. 8, lines 56-67, the evaluator 330 can evaluate the sentences obtained from the set of sentence generation modules 328 to select a sentence to autocaption with the image 314, “generating caption for the visual media”, using parameters relating to sentence style (e.g., writing style), sentence detail and/or repetitiveness, user preferences, and/or user feedback, among others, “user's personal narrative style”, [i.e., generating caption for the visual media in user's personal narrative style])
Chen, Mishra et al, and Baker are combinable because they are both concerned with visual media description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Chen and Mishra, to use the set of sentence generation module 328, as though by Baker, in order to produce a sentence caption, (Baker, col. 1, lines 40-41).
Although, Mishra et al discloses generating a personalized language model 
configured to reflect a personal narrative style of the user, (see at least: col. 2, lines 1-10), the combine teaching Chen, Mishra et al, and Baker as whole does not expressly disclose receiving a request to generate a personalized language model.
Weiner discloses receiving a request to generate a personalized language model, (see Fig. 5, and Par. 0050, receiving a request by the user to access the multi-use service. In step 504, the multi-user service can receive voice information from the identified user. The voice information can correspond to an utterance made by the user and captured via a voice user interface. In step 506, a personalized language model can be retrieved for the identified user, [i.e., implicitly receiving a request to generate a personalized language model based on voice information from the identified user]).
Chen, Mishra et al, Baker, and Weiner are combinable because they are all concerned with visual media description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Chen, Mishra et al, and Baker, to use the voice information from the identified user form the multi-user service, as though by Weiner, in order to retrieve the personalized language model for the identified user, (Weiner, Par. 0050).

In regards to claim 2, the combine teaching Chen, Mishra et al, Baker, and Weiner as whole discloses the limitations of claim 1.
Furthermore, Mishra et al discloses wherein the request to generate the personalized language model includes representative target language, and the representative target language is used to supplement the retrieved public facing language, (Mishra et al, see at least: Fig. 3, and col. 10, lines 16-42, building a personalized natural language generation model by searching for quotations across a network or database such as the Internet 304. The quotations can be retrieved from resources such as social media 306, novels 308 and literary narratives, television scripts 310, blogs 312, and articles 314, such as newspaper and other websites, [i.e., retrieving public facing language]. Further, the personality independent quotation lattice 324 is used to create a personalized natural language generation model which can provide the user synthetic speech and/or text in a personalized manner, [i.e., representative target language, which is implicitly used to supplement the retrieved quotations from the social media, “public facing language”]). 

Regarding claim 8, claim 8 recites substantially similar limitations as set forth in claim 1. As such, claim 8 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “a non-transitory computer readable medium storing instruction(s) that, when executed, cause a processor to”. However, Chen discloses the “non-transitory computer-readable medium storing instructions”, (Chen, see at least: Par. 0076, Computing device 705 can use and/or communicate using computer-usable or computer-readable media).

Regarding claim 9, claim 9 recites substantially similar limitations as set forth in claim 2. As such, claim 9 is in rejected for at least similar rational.

Regarding claim 15, claim 15 recites substantially similar limitations as set forth in claim 1. As such, claim 15 is in rejected for at least similar rational.
The Examiner further acknowledged the following additional limitation(s): “an apparatus comprising: a hardware processor; and a non-transitory computer-readable medium storing instruction(s) that, when executed, cause the processor to”. However, Chen et al discloses the “an apparatus comprising: a hardware processor; and a non-transitory computer-readable medium storing instructions that, when executed, cause the processor to”, (see at least: Par. 0007, “server apparatus”).

Regarding claim 16, claim 16 recites substantially similar limitations as set forth in claim 2. As such, claim 16 is in rejected for at least similar rational.

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen, Mishra et al, Baker, and Weiner, as applied to claim 1 above; and further in view of Bostick et al, (US-PGPUB 2018/0137604)

In regards to claim 6, the combine teaching Chen, Mishra et al, Baker, and Weiner as whole discloses the limitations of claim 1.
Furthermore, Chen discloses wherein the public-facing language includes captions and one or more other items, (Par. 0019, at least one device 135 may also analyze the social media posts 140 by the user 150  to detect location categories (and/or business categories) and caption content associated with the social media post 140, [i.e., the public-facing language includes captions and one or more other items]); and building the personalized language model comprises the captions, (steps 410, 415, 420, 425, and Paragraphs 0040-0042, setting parameters of a language model to be used for caption generation based on inferred topics, [i.e., the personalized language model implicitly comprises the captions).
The combine teaching Chen, Mishra et al, Baker, and Weiner as whole does not expressly disclose weighting the captions more heavily than the one or more other items.
However, Bostick et al discloses weighting the captions more heavily than the one or more other items, (see at least: Fig. 4, and Par. 0056, the relationship to the viewing user may have a weighting of 2.0×, the relevance to the viewing user may have a weighting of 1.5×, the relevance to the caption may have a weighting of 1.7×, the sentiment of the viewing user to similar photographs may have a weighting of 1.0×, and the sentiment of aggregate social network users may have a weighting of 0.6×, [i.e.,  weighting the captions more heavily than the one or more other items, “relevance to the viewing user, the sentiment of the viewing user to similar photographs, and the sentiment of aggregate social network users”]).
Chen, Mishra et al, Baker, Weiner, and Bostick are combinable because they are all concerned with visual media description. Therefore, it would have been obvious to a person of ordinary skill in the art, to modify the combine teaching Chen, Mishra et al, Baker, and Weiner, to use focus module 123, as though by Bostick, in order to assign weightings to the captions higher comparing to the least one or more of the relevance to the viewing user, the sentiment of the viewing user to similar photographs, and the sentiment of aggregate social network users, (Bostick, Par. 0054).

Regarding claim 13, claim 13 recites substantially similar limitations as set forth in claim 6. As such, claim 13 is in rejected for at least similar rational.

Regarding claim 20, claim 20 recites substantially similar limitations as set forth in claim 6. As such, claim 20 is in rejected for at least similar rational.

Allowable Subject Matter
Claims 3-5, 7, 10-12, 14, and 17-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

With respect to claim 3, the prior art of record, alone or in reasonable combination, does not teach or suggest, the following limitation(s), (in consideration of the claim as a whole):  
“wherein some of the public-facing language is machine-generated language, 
and the machine-generated language is excluded from use when building the personalized language model”.

The relevant prior art of record, Mishra et al (US Patent 9,972,309), discloses retrieving public-facing language associated with the first user; and using the public-facing language to build the personalized language model, and the personalized language model reflecting a vocabulary, a sentence or phrase structure, (see at least: Fig. 3, and col. 10, lines 11-42, the quotations can be retrieved from resources such as social media 306, novels 308 and literary narratives, television scripts 310, blogs 312, and articles 314, such as newspaper and other websites. The quotations sought for can be specific to a demographic or demographics of a user, be defined by a specific language or accent, age, geography, education level, socio-economic status, or use other characteristics of the user, [i.e., retrieving public-facing language associated with the first user]. Further, the server 302, such as the system 206 of FIG. 2, builds a personalized natural language generation model by searching for quotations across a network or database such as the Internet 304, [i.e., using the public-facing language to build the personalized language model]). However, while disclosing the retrieving public-facing language associated with the user; and using the public-facing language to build the personalized language model; Mishra et al fails to teach or suggest, either alone or in combination with the other cited references, wherein some of the public-facing language is machine-generated language, and the machine-generated language is excluded from use when building the personalized language model.

With respect to claim 4, the prior art of record, alone or in reasonable combination, does not teach or suggest, the following limitation(s), (in consideration of the claim as a whole):  
“determining through extrinsic evidence that the user approves of the machine-
generated language; and including the machine-generated language when building the personalized language model”.

The relevant prior art of record, Mishra et al (US Patent 9,972,309), discloses using the public-facing language to build the personalized language model, such that the personalized language model reflects vocabulary, a sentence or phrase structure, (see at least: Fig. 3, and col. 10, lines 11-42, the server 302, such as the system 206 of FIG. 2, builds a personalized natural language generation model by searching for quotations across a network or database such as the Internet 304, [i.e., using the public-facing language to build the personalized language model]); but fails to teach or suggest, either alone or in combination with the other cited references, determining through extrinsic evidence that the user approves of the machine-generated language; and including the machine-generated language when building the personalized language model.

With respect to claim 5, the prior art of record, alone or in reasonable combination, does not teach or suggest, the following limitation(s), (in consideration of the claim as a whole):  
“determining that the retrieved public-facing language is insufficient to build the
personalized language model; identifying one or more associated users that are connected to the user in a social networking graph; and supplementing the retrieved public-facing language with public-facing language of the one or more associated users”.

The relevant prior art of record, Mishra et al (US Patent 9,972,309), discloses retrieving public-facing language associated with the first user; and using the public-facing language to build the personalized language model, and the personalized language model reflecting a vocabulary, a sentence or phrase structure, (see at least: Fig. 3, and col. 10, lines 11-42, the quotations can be retrieved from resources such as social media 306, novels 308 and literary narratives, television scripts 310, blogs 312, and articles 314, such as newspaper and other websites. The quotations sought for can be specific to a demographic or demographics of a user, be defined by a specific language or accent, age, geography, education level, socio-economic status, or use other characteristics of the user, [i.e., retrieving public-facing language associated with the first user]. Further, the server 302, such as the system 206 of FIG. 2, builds a personalized natural language generation model by searching for quotations across a network or database such as the Internet 304, [i.e., using the public-facing language to build the personalized language model]). However, while disclosing the retrieving public-facing language associated with the user; and using the public-facing language to build the personalized language model; Mishra et al fails to teach or suggest, either alone or in combination with the other cited references, determining that the retrieved public-facing language is insufficient to build the personalized language model; identifying one or more associated users that are connected to the user in a social networking graph; and supplementing the retrieved public-facing language with public-facing language of the one or more associated users, (as combined with the other claimed limitations).



With respect to claim 7, the prior art of record, alone or in reasonable combination, does not teach or suggest, the following limitation(s), (in consideration of the claim as a whole):  
“assigning a confidence score to the test caption based on one or more of whether the caption makes grammatical sense, whether the personalized language model was able to find suitable user-specific vocabulary, or whether the personalized language model was able to identify a suitable user-specific sentence structure; comparing the confidence score to a predetermined threshold; and when the confidence score does not exceed the predetermined threshold, retrieving supplemental public-facing language to train the personalized language model”.

The relevant prior art of record, Chen et al, (US-PGPUB 2017/0177623) discloses providing the context to a personalized language model; and generating a caption for the visual media using the personalized language model, (steps 410, 415, 420, 425, and Paragraphs 0040-0042, setting parameters of a language model to be used for caption generation based on inferred topics; and step 430 in Fig. 4, and Para 0046, discloses the generate a caption based on the reference data relevant to determining common activities occurring at a location category and the inferred topics using the identified language model, [i.e., the reference data or the context of the visual media is provided to the language model, which said language model is used for generating the caption for the visual media]); but fails to teach or suggest, either alone or in combination with the other cited references, assigning a confidence score to the test caption based on one or more of whether the caption makes grammatical sense, whether the personalized language model was able to find suitable user-specific vocabulary, or whether the personalized language model was able to identify a suitable user-specific sentence structure, (as combined with the other claimed limitations).

A further prior art of record, Bostick et al (US-PGPUB 2018/0137604), discloses assigning a confidence score to the test caption to determine the single focus for the uploaded photograph, (see at least: Fig. 4, and Par. 0056, the relationship to the viewing user may have a weighting of 2.0×, the relevance to the viewing user may have a weighting of 1.5×, the relevance to the caption may have a weighting of 1.7×, the sentiment of the viewing user to similar photographs may have a weighting of 1.0×, and the sentiment of aggregate social network users may have a weighting of 0.6×); but fails to teach or suggest, either alone or in combination with the other cited references, assigning a confidence score to the test caption based on one or more of whether the caption makes grammatical sense, whether the personalized language model was able to find suitable user-specific vocabulary, or whether the personalized language model was able to identify a suitable user-specific sentence structure, (as combined with the other claimed limitations).

Regarding claim 10, claim 10 recites substantially similar limitations as set forth in claim 3. As such, claim 10 is in condition for allowance, for at least similar reasons



Regarding claim 11, claim 11 recites substantially similar limitations as set forth in claim 4. As such, claim 11 is in condition for allowance, for at least similar reasons

Regarding claim 12, claim 12 recites substantially similar limitations as set forth in claim 5. As such, claim 12 is in condition for allowance, for at least similar reasons

Regarding claim 14, claim 14 recites substantially similar limitations as set forth in claim 7. As such, claim 14 is in condition for allowance, for at least similar reasons

Regarding claim 17, claim 17 recites substantially similar limitations as set forth in claim 3. As such, claim 17 is in condition for allowance, for at least similar reasons

Regarding claim 18, claim 18 recites substantially similar limitations as set forth in claim 4. As such, claim 18 is in condition for allowance, for at least similar reasons

Regarding claim 19, claim 19 recites substantially similar limitations as set forth in claim 5. As such, claim 19 is in condition for allowance, for at least similar reasons





Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        10/04/2022