Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/01/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Note the following invokes 112(f):
Claim 8 states the phrases “means for separating words,” “means for deriving subword embedding vector values,” “means for performing syntagma-based position encoding,” and “means for performing sentence embedding calculation,” all which uses the generic placeholder and transition “means for”, and are modified by functional language “separating, deriving, performing, and performing,” and is not modified by sufficient structure.
Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4, 5, 8, 11, and 12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1  determining position encoding values by performing syntagma-based position encoding using fixed weights according to word positions in the sentence after the deriving of subword embedding vector values in order for sentence embedding calculation; and performing sentence embedding calculation from the subword embedding vector values and the position encoding values.” 
The limitation of “separating…”, “extracting…”, “deriving…”, “determining…”, and “performing…” as drafted covers a human performing mental processes and utilizing mathematical concepts. More specifically, humans reading sentences using their mind, then identifying words and subwords from the sentences, assigning values to those words and subwords and also values according to their position in the sentence, and then performing a mathematical calculation using those values. 
This judicial exception is not integrated into a practical application. In particular, claims 1 and 8 recites additional elements “apparatus” and “method” as per the independent claims. For example, in the filed specification, there is description of using a “hardware element including at least one of a digital signal processor (DSP), a processor, a controller, an application-specific integrated circuit (ASIC), a programmable logic element (a field programmable gate array (FPGA) or the like), and other electronic devices or a 20 combination thereof (Paragraph 59). Accordingly, these additional elements does not integrate the abstract 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
With respect to Claim 4/1 and 11/8, the claim relates to “syntagma-based position encoding comprises applying differentiated weights to full morphemes and bound morphemes constituting the sentence.” This reads calculating via different variables and numbers. No additional limitations are present.
With respect to Claim 5/1 and 12/8, the claim relates to “performing of sentence embedding calculation comprises multiplying the subword embedding vector values and the position encoding values together and calculating an average of the products regarding the whole sentence.” This reads calculating via multiplying and averaging numbers. No additional limitations are present.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Note: Method claims 1-7 and apparatus claims 8-14 are respectively related, as each claimed element’s function corresponds to the claimed apparatus function.
Claim(s) 1-2, 4, 6, 8-9, 11, and 13 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hashimoto et. al. (US Patent Document US-2018/0121799-A1), hereinafter Hashimoto.
Regarding Claim 1 and 8, Hashimoto teaches:
A sentence embedding method for outputting a sentence embedding result from an input sentence on the basis of subword embedding and skip-thoughts, the method comprising: 

extracting subwords from the words determined in the separating of words; (Hashimoto, Figure 1, Page 3, Paragraph 52: Decomposed token embedder processes token decompositions [as extracted subwords] of the token at multiple scales) (Hashimoto, Page 4, Paragraph 59: The n-character-gram embedder 206 processes character substrings [as extracted subwords] of the word at multiple scales of substring length)
deriving subword embedding vector values by embedding the extracted subwords when the extracting of subwords is finished; (Hashimoto, Figure 1, Page 3, Paragraph 52: Decomposed token embedder maps [as embeds] each processed token decomposition into an intermediate vector [as subword embedding vector values] representing a position in a token decomposition embedding space) (Hashimoto, Page 4, Paragraph 59: The n-character-gram embedder 206 maps [as embeds] each processed character substring into an intermediate vector [as subword embedding vector values] representing a position in the character embedding space 208)

performing sentence embedding calculation from the subword embedding vector values and the position encoding values. (Hashimoto, Figure 1, Page 3, Paragraph 52: Decomposed token embedder combines [as calculates] the intermediate vectors for each unique processed token decomposition to produce token decomposition embedding vectors [as combined word embedding vector values and the position encoding values] for each of the tokens) (Hashimoto, Page 4, Paragraph 60: The n-character-gram embedder combines the intermediate vectors [as word embedding vector values and the position encoding values] to produce element wise averages [as sentence embedding calculation] in the character embedding vector)
Regarding Claim 2 and 9, Hashimoto teaches all of Claim 1 and 8 limitations above. Furthermore, Hashimoto teaches:

generating a subword table including a {word: subword set} dictionary and a {subword: vector value} table by separating words and extracting subwords from training text including consecutive sentence context; (Hashimoto, Page 4, Paragraph 57: Character embedder 206  constructs [as generates] the vocabulary [as word-subword dictionary] of the character n-grams [as subword] in the training data [as training text] and assigns an embedding [as vector value] for each character n-gram; Figure 2, Page 4, Paragraph 57-58: character [as subword] embedding space 208 includes a 1, 2, 3 and 4-gram [as consecutive sentence context] embedding; example of character n-grams (n=l, 2, 3) of the word "Cat" [represent separated and extracted subwords]; each word is represented as word representation x [as word-subword-vector value table], 222, the concatenation of its corresponding word embedding 210 and character embedding 220; Page 4, Paragraph 63: vocabulary of the character n-grams is built on the training corpus, the case-sensitive English Wikipedia text [as training text]; Figure 1, Page 2, Paragraph 40: a part-of speech (POS) label embedding layer 104a/b and a chunk/chunking label embedding layer [as separating/extracting subwords] 106a/b produce POS/chunk label embedding vectors and state vectors [as subword vector values] for each of the words)
generating subword embedding training data of {target word, contextual word} for subword embedding learning; (Hashimoto, Page 4, Paragraph 57: Character embedder 206  constructs [as generates] the vocabulary [as training data] of the character n-grams 
generating skip-thought sentence embedding training data of {target sentence, contextual sentence} for skip-thought sentence embedding learning; and (Hashimoto, Page 4, Paragraph 62: word embeddings [as sentence embedding training data] are trained using the skip-gram [as skip-thought sentence embedding learning] or the CBOW model with negative sampling; Page 4, Paragraph 64: for each word-context pair [as target-contextual sentence] in the training corpus, N negative context words are sampled using an objective function)
constructing a subword embedding and skip-thought sentence embedding integration model from the subword embedding training data and the skip-thought sentence embedding training data and generating  subword embedding vector values which are final train results.  (Hashimoto, Page 4, Paragraph 56: character embedder 206 uses a skip-gram [as skip-thought] model to train the word embedding matrix [and] a continuous bag-of-words (CBOW) [as sub-word] model to train the character embedding matrix; character n-gram embeddings are learned using the same skip-gram objective function as the word vectors); Page 4, Paragraph 59: The n-character-gram embedder 206 combines the intermediate vectors for each unique processed character substring to produce character [as subword] embedding vectors as generated subword embedding vector valuesfor each of the words. Page 4, Paragraph 62: word embeddings [of integration model 100] are trained using the skip-gram or the CBOW model; Page 4, 
Regarding Claim 4 and 11, Hashimoto teaches all of Claim 1 and 8 limitations above. Furthermore, Hashimoto teaches:
The sentence embedding method of claim 1, wherein the syntagma-based position encoding comprises applying differentiated weights to full morphemes and bound morphemes constituting the sentence. (Hashimoto, Page 6, Paragraphs 125 and 128: The objective function for dependency layer [for syntactic tasks involving the character embedding space as bound morphemes] uses weight parameter Wdep (Paragraph 128), while the objective function for chunking layer [for word-level tasks involving the word embedding space as full morphemes] uses [a different] weight parameter Wchk (Paragraph 125)
Regarding Claim 6 and 13, Hashimoto teaches all of Claim 1 and 8 limitations above. Furthermore, Hashimoto teaches:
A subword embedding and skip-thought sentence embedding integration model construction method for generating subword embedding vector values required for performing the sentence embedding method of claim 1, the method comprising:  (Hashimto, Figure 2A, Page 4, Paragraph 54: joint-embedding technique 200 [as 
generating a subword table including a {word: subword set} dictionary and a {subword: vector value} table by separating words and extracting subwords from training text including consecutive sentence context; (Hashimoto, Page 4, Paragraph 57: Character embedder 206  constructs [as generates] the vocabulary [as word-subword dictionary] of the character n-grams [as subword] in the training data [as training text] and assigns an embedding [as vector value] for each character n-gram; Figure 2, Page 4, Paragraph 57-58: character [as subword] embedding space 208 includes a 1, 2, 3 and 4-gram [as consecutive sentence context] embedding; example of character n-grams (n=l, 2, 3) of the word "Cat" [represent separated and extracted subwords]; each word is represented as word representation x [as word-subword-vector value table], 222, the concatenation of its corresponding word embedding 210 and character embedding 220 [as subword vector values]; Page 4, Paragraph 63: vocabulary of the character n-grams is built on the training corpus, the case-sensitive English Wikipedia text [as training text]; Figure 1, Page 2, Paragraph 40: a part-of speech (POS) label embedding layer 104a/b and a chunk/chunking label embedding layer [as separating/extracting subwords] 106a/b produce POS/chunk label embedding vectors and state vectors [as subword vector values] for each of the words)
generating subword embedding training data of {target word, contextual word} for subword embedding learning;  (Hashimoto, Page 4, Paragraph 57: Character embedder 
generating skip-thought sentence embedding training data of {target sentence, contextual sentence} for skip-thought sentence embedding learning; and (Hashimoto, Page 7, Paragraph 107: The relatedness vector calculator 720 calculates a sentence-level representation 708a and 708b of each of the first [as target] and second [as context] sentences; Page 13, Paragraph 168: sentence embedding layer can include a word embedder and an n-character-gram embedder; Page 4, Paragraph 5: character embedder 206 uses a skip-gram model to train the word embedding matrix])
constructing a subword embedding and skip-thought sentence embedding integration model from the subword embedding training data and the skip-thought sentence embedding training data. (Hashimoto, Page 4, Paragraph 56: character embedder 206 uses a skip-gram [as skip-thought] model to train the word embedding matrix [and] a continuous bag-of-words (CBOW) [as sub-word] model to train the character embedding matrix; character n-gram embeddings are learned using the same skip-gram objective function as the word vectors); Page 4, Paragraph 59: The n-character-gram embedder 206 combines the intermediate vectors for each unique processed character substring to produce character [as subword] embedding vectors as generated subword embedding vector valuesfor each of the words. Page 4, Paragraph 62: word embeddings [of integration model 100] are trained using the skip-gram or the CBOW model; Page 4, 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3, 7, 10, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hashimoto, in view of Jawahar et al. (Non-Patent Literature titled "Doc2Sent2Vec: A Novel Two-Phase Approach for Learning Document Representation"), hereinafter Jawahar, and in view of .
Regarding Claim 3, 7, 10, and 14, Hashimoto teaches all of Claim 2, 6, 9, and 13 limitations above. However, Hashimoto does not teach:
The sentence embedding method of claim 2, wherein the subword embedding and skip-thought sentence embedding integration model finds a subword embedding value 
    PNG
    media_image1.png
    66
    56
    media_image1.png
    Greyscale
which maximizes a log likelihood function L as shown in the following equation:

    PNG
    media_image2.png
    160
    115
    media_image2.png
    Greyscale

(where Tw denotes a size of subword embedding training data, Ts denotes a size of skip-thought sentence embedding training data, Ct denotes a set of contextual words wc of wt, and Nt denotes a contextual sentence set of a target sentence sentt). 
Bojanowski does teach:

    PNG
    media_image2.png
    160
    115
    media_image2.png
    Greyscale
 
The first-half of the log likelihood function L, where Tw denotes a size of subword embedding training data, Ct denotes a set of contextual words wc of wt (Bojanowksi, Page 136, Section 3.1:  Given a large training corpus [as Tw] represented as a sequence of words, the objective of the skip gram model is to maximize the following log-likelihood [focused on the word-level as L_word], where the context Ct is the set of indices of words surrounding 

    PNG
    media_image3.png
    200
    400
    media_image3.png
    Greyscale

Hashimoto and Bojanowski are all considered to be analogous to the claimed invention because they are in the same field of machine language processing. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hashimoto (directed to a character-gram embedder mapping character substrings into an intermediate vector representing a position in the character embedding space and a using a skip-gram model to train the word embedding matrix and a continuous bag-of-words (CBOW) model to train the character embedding matrix) and Bojanowski (directed to maximizes a log likelihood function L at the word level) and arrived at a joint-embedding technique that uses a subword embedding and skip-thought sentence embedding model and maximizes a log likelihood function L at the word-level L_sent. One of ordinary skill in the art would have been motivated to make such a combination because incorporating character n-grams into the skipgram model outperforms baselines that do not take into account subword information (Bojanowski, Page 145, Section 7).
However, Hashimoto in view of Bojanowski does not teach:

    PNG
    media_image2.png
    160
    115
    media_image2.png
    Greyscale


Jawahar does teach:

    PNG
    media_image2.png
    160
    115
    media_image2.png
    Greyscale

The second-half of the log likelihood function L, where Ts denotes a size of skip-thought sentence embedding training data, and Nt denotes a contextual sentence set of a target sentence sentt (Jawahar, Page 3, Equation 11: objective function to maximize log likelihood probability L [at both the word level L_word and sentence level L_sent[)

    PNG
    media_image4.png
    200
    400
    media_image4.png
    Greyscale


Hashimoto, Bojanowski, and Jawahar are all considered to be analogous to the claimed invention because they are in the same field of machine language processing. Accordingly, it would have been obvious to one of ordinary skill in the art at the time the invention was effectively filed to have combined Hashimoto (directed to a character-gram embedder mapping character substrings into an intermediate vector representing a position in the character embedding space and a using a skip-gram model to train the word embedding matrix and a continuous bag-of-words (CBOW) model to train the character embedding matrix) and Bojanowski (directed to maximizes a log likelihood function L at the word level) and  Jawahar (directed to maximizes a log likelihood function L at both the word-level L_sent and sentence-. One of ordinary skill in the art would have been motivated to make such a combination because a novel sentence-level language model which exploits the sentence sequence present in the document can provide accurate and rich document representations (Jawahar, Page 4, Section 5).
Allowable Subject Matter
Claim 5 and 12 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.  
Regarding Claim 5 and 12, Hashimoto teaches all of Claim 1 and 8 limitations above, specifically directed to a character-gram embedder mapping character substrings into an intermediate vector representing a position in the character embedding space and a using a skip-gram model to train the word embedding matrix and a continuous bag-of-words (CBOW) model to train the character embedding matrix), Bojanowski (directed to maximizes a log likelihood function L at the word level. However, Hashimoto does not teach:
The sentence embedding method of claim 1, wherein the performing of sentence embedding calculation comprises multiplying the subword embedding vector values and the position encoding values together and calculating an average of the products regarding the whole sentence.  
 Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's : 
Ghaeini et. al. (US Patent Document US-2019/0205733-A1) teaches a skip-thought encoder that effectively functions as a lookup table that maps sequences of words (or sentences) to sequence of word embeddings (Page 3, Paragraph 24).
Lee et. al. (US Patent Document US-2018/0121419-A1) teaches a plurality of weights associated with a plurality of pieces of sampling data corresponding to an input sentence may be calculated (Page 4, Paragraph 55).
Lin et. al. (US Patent Document US-2020/0050667-A1) teaches preprocessing the sentence input allowing tokenization thereof at a word level (Page 2, Paragraph 32).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANUP CHANDORA whose telephone number is (571)272-4202.  The examiner can normally be reached on Full-time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Paras Shah can be reached on (571) 270-1650.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent 


/ANUP CHANDORA/Examiner, Art Unit 2658                                          
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658