DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending in this application.

CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 4-7, 9-11, 18, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Regarding claims 4-7, 9-11, 18, and 19, the equations and parameters render the claim indefinite because each parameters/variables are not defined by the claim and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.

Examiner’s note
Claim 1 defines computer-readable media having computer instructions stored and Applicant’s specification define the computer storage media is non-transitory such that it does not comprise a signal per se (in specification, paragraph [0114]). Therefore, claims 1-11 are not rejected under 35 USC § 101.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 8-10, 12-17, and 20 are rejected under pre-AIA  35 U.S.C. 103(a) as being unpatentable over Paulus et al., (“A Deep Reinforced model for Abstractive Summarization”, arXiv: 2017, hereinafter Paulus) in view of Zhou et al., (“Exploring Contextual Word-level Style relevance for unsupervised style transfer”, published May, 5, 2020, hereinafter Zhou).
Regarding Claim 1, Paulus discloses one or more computer-readable media having computer instructions stored thereon for execution by one or more processors, wherein execution of the computer instructions by the one or more processors provides a method for stylistic expression transfer, the method comprising: 

encoding the source sequence data as one or more time steps, the one or more time steps corresponding to the one or more words of the source sentence (pp. 2-3, Fig. 1 and section 2.1, intra-temporal attention function ensures that different parts of the encoded input sequence are used);  
selecting the word based on the decoding  (pp. 3 and 4, section 2.2, intra-decoder attention and section 2.3, token generation and pointer, selecting a token based on ground-truth value and corresponding index); and 
generating target sequence data, the target sequence data including a target sentence that includes the word selected for each of the one or more time steps based on the decoding, wherein the target sequence data is different from the source sequence data (pp. 2, Fig. 1, generating target sequence data including assigned to the target word ‘expanded’ whose an attention score is 0.8, while the source word ‘became’ is assigned an attention score of 0.1).
Paulus does not explicitly teach, however Zhou discloses including the bracketed limitation:
for each of the one or more time steps: decoding the time step by determining a word having a content preservation loss value that is less than at least one other word for the time step and having a style transfer loss value that is less than the at least one other word for the time step (pp. 4, Fig. 1 and pp. 5, Left Column, section 3.2, stage 2, Fine-tune the Extended Model; pp. 5, Right Column, section 3.2.2. the Objective 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate RNN-implemented methods and systems for producing the intra-decoder attention vector and processing the vector to emit a summary token as taught by Paulus with the method of using a carefully-designed objective function involving style transfer loss function and content preservation loss function as taught by Zhou to achieve state-of-art performance in terms of both transfer accuracy and content preservation (Zhou, Abstract).
Regarding Claim 2, Paulus in view of Zhou discloses the media of claim 1, and Paulus further discloses:
wherein decoding the time step further comprises determining a vocabulary probability distribution (pp. 3, section 2.3, Token generation and pointer, token-generation layer generates probability distribution).
Regarding Claim 3, Paulus in view of Zhou discloses the media of claim 1, and Paulus further discloses:
wherein decoding the time step further comprises determining a words probability distribution, and wherein the word selected based on the decoding has a highest overall probability distribution relative to the at least one other word at the time step (pp. 4, 1st paragraph, calculating probability distribution for the output token and determining an out-of-vocabulary token based on the value).
Regarding Claim 4, Paulus in view of Zhou discloses the media of claim 1, and Paulus further discloses:
wherein decoding the time step further comprises determining a cross entropy loss value for the word, wherein the cross entropy loss value is calculated using a cross entropy loss function, wherein the cross entropy loss function is Lml =

    PNG
    media_image1.png
    39
    191
    media_image1.png
    Greyscale
 
(Paulus, pp. 4, section 3.1, the maximum-likelihood training objective is the minimization of loss:

    PNG
    media_image2.png
    76
    580
    media_image2.png
    Greyscale
).
Regarding Claim 5, Paulus in view of Zhou discloses the media of claim 1, and Paulus further discloses:
wherein the word having the content preservation loss value that is less than the at least one other word for the time step is calculated using a content preservation loss function, wherein the content preservation loss function Lcp may be expressed as

    PNG
    media_image3.png
    40
    475
    media_image3.png
    Greyscale

(Paulus, pp. 4-5, section 3.2, Policy Learning, learning a policy that maximizes a specific discrete metric:

    PNG
    media_image4.png
    73
    640
    media_image4.png
    Greyscale
),

Regarding Claim 8, Paulus in view of Zhou discloses the media of claim 1, and Paulus further discloses:
applying an overall loss function at the time step in order to train an encoder-decoder model (pp. 2, section 2, 1st paragraph, Neural intra-attention model based on the encoder-decoder network).
Regarding Claim 9, Paulus in view of Zhou discloses the media of claim 8.
Paulus does not explicitly teach, however Zhou discloses:
wherein the overall loss function is Loss = βLcp + γLts. (pp. 5, Right Column, section 3.2.2. objective function is derived from sum of Lcp  _ content preservation loss and Lst _ style transfer loss. α, β, γ are balancing parameters).
Regarding Claim 10, Paulus in view of Zhou discloses the media of claim 8.
Paulus does not explicitly teach, however Zhou discloses:
wherein the overall loss function is Loss = αLml + βLcp + γLts. (pp. 5, Right Column, section 3.2.2. objective function is derived from sum of Lml _ frequency modeling loss, Lcp  _ content preservation loss and Lst _ style transfer loss. α, β, γ are balancing parameters).
Regarding Claim 12, Paulus discloses a method for implementing stylistic expression transfer, the method comprising: 
receiving source sequence data, the source sequence data including a source sentence of one or more words (pp. 2, Fig. 1 and section 2.1, receiving input sequence includes a plurality of words);
encoding the source sequence data as one or more time steps, the one or more time steps corresponding to the one or more words of the source sentence, wherein 
for each of the one or more time steps in the compressed representation, decoding the time step by determining loss values for at least one word available for selection, wherein decoding comprises (pp. 3, section 2.2, intra-decoder attention, for each decoding step t, our model computes a new decoder context vector; section 2.3, token generation and pointer, selecting a token based on ground-truth value and corresponding index);
for each of the one or more time steps, selecting the at least one word based on the [content preservation loss value] and the [style transfer loss value] calculated for the at least one word being less than a [content preservation loss value] and a [style transfer loss value] calculated for the at least one other word for the time step (pp. 3 and 4, section 2.2, intra-decoder attention and section 2.3, token generation and pointer, selecting a token based on ground-truth value and corresponding index); and 
generating target sequence data, the target sequence data including a target sentence that includes the at least one word selected for the one or more time steps, wherein the target sequence data is different from the source sequence data (pp. 2, Fig. 1, generating target sequence data including assigned to the target word ‘expanded’ whose an attention score is 0.8, while the source word ‘became’ is assigned an attention score of 0.1).

for each of the one or more time steps, selecting the at least one word based on the [content preservation loss value] and the [style transfer loss value] calculated for the at least one word being less than a [content preservation loss value] and a [style transfer loss value] calculated for the at least one other word for the time step; calculating a content preservation loss value for the at least one word available for selection is less than at least one other word for the time step (pp. 4, Fig. 1 and pp. 5, Left Column, section 3.2, stage 2, Fine-tune the Extended Model, pp. 5, Right Column – pp. 6, Left Column, section 3.2.2, calculating the Content preservation loss function); and
calculating a style transfer loss value for the at least one word available for selection that is less than the at least one other word for the time step (pp. 4, Fig. 1 and pp. 5, Left Column, section 3.2, stage 2, Fine-tune the Extended Model, pp. 5, Right Column, section 3.2.2, calculating the style transfer loss function; “each word is defined as the weighted sum of word embeddings with the prediction probability at the current timestep”).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate RNN-implemented methods and systems for producing the intra-decoder attention vector and processing the vector to emit a summary token as taught by Paulus with the method of using a carefully-designed objective function involving style transfer loss function and content 
Regarding Claim 13, Paulus in view of Zhou discloses the method of claim 12.
Paulus does not explicitly teach, however Zhou discloses:
determining an overall loss value for the at least one word at each of the one or more time steps, wherein the overall loss value is determined by the function is Loss = αLml + βLcp + γLts. (pp. 5, Right Column, section 3.2.2. objective function is derived from sum of Lml _ frequency modeling loss, Lcp  _ content preservation loss and Lst _ style transfer loss. α, β, γ are balancing parameters).
Regarding Claim 14, Paulus in view of Zhou discloses the method of claim 12, and Paulus further discloses:
for each of the one or more time steps, calculating a words probability distribution (pp. 4, 1st paragraph, calculating probability distribution for the output token).
Regarding Claim 15, Paulus in view of Zhou discloses the method of claim 12.
Paulus does not explicitly teach, however Zhou discloses:
mapping each of the one or more words of the source sequence data to an embedding space (pp. 3, Right Column, section 3.1.1 Attentional Seq2seq model, mapping the input words into a hidden state sequence as embedding vector and the hidden state of the words).
Regarding Claim 16, Paulus in view of Zhou discloses the method of claim 12.
Paulus does not explicitly teach, however Zhou discloses:

Regarding Claim 17, Paulus in view of Zhou discloses the method of claim 12.
Paulus does not explicitly teach, however Zhou discloses:
determining attention weights for each of the one or more hidden states (pp. 3, Right Column, section 3.1.1 Attentional Seq2seq model, calculating weighted sum of all hidden states of input words ).
Regarding Claim 20, Paulus discloses a computer system comprising:
a means for:
receiving source sequence data, the source sequence data including a source sentence of one or more words (pp. 2, Fig. 1 and section 2.1, receiving input sequence includes a plurality of words);
encoding the source sequence data as one or more time steps, the one or more time steps corresponding to the one or more words of the source sentence (pp. 2-3, Fig. 1 and section 2.1, intra-temporal attention function ensures that different parts of the encoded input sequence are used);  
selecting the word based on the decoding  (pp. 3 and 4, section 2.2, intra-decoder attention and section 2.3, token generation and pointer, selecting a token based on ground-truth value and corresponding index); and
generating target sequence data, the target sequence data including a target sentence that includes the word selected for each of the one or more time steps based on the decoding, wherein the target sequence data is different from the source 
Paulus does not explicitly teach, however Zhou discloses including the bracketed limitation:
for each of the one or more time steps: decoding the time step by determining a word having a content preservation loss value that is less than at least one other word for the time step and having a style transfer loss value that is less than the at least one other word for the time step (pp. 4, Fig. 1 and pp. 5, Left Column, section 3.2, stage 2, Fine-tune the Extended Model; pp. 5, Right Column, section 3.2.2. the Objective Function is used for fine-tuning; and calculating the style transfer loss function; “each word is defined as the weighted sum of word embeddings with the prediction probability at the current timestep”; and pp. 5, Right Column – pp. 6, Left Column, section 3.2.2, calculating the Content preservation loss function).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to incorporate RNN-implemented methods and systems for producing the intra-decoder attention vector and processing the vector to emit a summary token as taught by Paulus with the method of using a carefully-designed objective function involving style transfer loss function and content preservation loss function as taught by Zhou to achieve state-of-art performance in terms of both transfer accuracy and content preservation (Zhou, Abstract).

Allowable Subject Matter
Claims 6, 7, 11, 18, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if rewritten or amended to overcome the rejection(s) under 35 U.S.C. 112 (b).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see attached form PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEONG-AH A. SHIN whose telephone number is (571)272-5933.  The examiner can normally be reached on 9 AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on 571-272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  


Seong-ah A. Shin
Primary Examiner
Art Unit 2659



/SEONG-AH A SHIN/           Primary Examiner, Art Unit 2659