DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Status of Claims
Claims 1-25 are currently pending.


Information Disclosure Statement
The Information Disclosure Statements (IDSs) submitted by Applicant on 8/29/2017 and 1/28/2019 have been considered.  

Claim Objections
Claim 8 is objected to because of the following informalities:   
In claim 8, incorrect claim dependency stated. Claim 8 incorrectly states dependency on claim 9. Examiner has understood the incorrect claim dependency as an inadvertent typographical error. In view of the rest of the recitations of claim 8, Examiner is interpreting claim 8 as dependent on preceding claim 7. Examiner notes that such interpretation would be consistent with analogous system claims 22 and 23. 
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
In claims 7-8, 15, and 22-23 - “first-party component” and “second-party component”. 
In claim 24 – “reference text data generation component”.
In claim 24 – “machine learning component”.
In claim 25 – “distribution generation component”.
In claim 25 – “feature matrix generation component”. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

12.	Claims 1-2, 9-10, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Le (US 20150220833 A1, hereinafter referred as “Le”) in view of Quoc Le and Thomas Mikolov, “Distributed Representations of Sentences”, Proceedings of the 31st International Conference on Machine Learning, 2014 (hereinafter referred as “Le and Mikolov”). 

Regarding claim 1, Le teaches a computer-implemented method for performing unsupervised feature representation learning for text data comprising:
generating, by a processor system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from a distribution (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer.); 

…; and 

providing, by the processor system, the feature matrix as an input to one or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system.).

	Examiner believes that Le teaches or at least implies generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5). 
	



	Regarding claim 2, the combination of Le in view of Le and Mikolov teaches teaches all of the limitations of claim 1, and the combination further teaches wherein generating the feature matrix includes:

computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close together the document vector representations are to each other – reading on and/or otherwise implying using a document distance measuring technique, as claimed above.]





Regarding claim 9, Le teaches a computer program product for performing unsupervised feature representation learning for text data, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method (Le, Paragraph [0006] teaches computer programs recorded on one or more computer storage devices configured to perform the actions of the methods; Le, Paragraph [0054] teaches “one or more modules of computer program instructions encoded on a tangible non tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus”.) comprising: 

generating, by the processor system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from a distribution (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer.); 

…; and 

providing, by the processor system, the feature matrix as an input to one or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system.).

	Examiner believes that Le teaches or at least implies generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework 



Regarding claim 10, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 9, and the combination further teaches wherein generating the feature matrix includes:

computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close using a document distance measuring technique, as claimed above.]

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5).



Regarding claim 16, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising one or more processors configured to perform a method (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s).) comprising: 
generating, by the system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from a distribution (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer.); 

…; and 

providing, by the system, the feature matrix as an input to one or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system.).

	Examiner believes that Le teaches or at least implies generating, by the system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5). 




Regarding claim 17, the combination of Le in view of Le and Mikolov teaches teaches all of the limitations of claim 16, wherein generating the feature matrix includes: computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close together the document vector representations are to each other – reading on and/or otherwise implying using a document distance measuring technique, as claimed above.]

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating .


13.	Claims 3-4, 7-8, 11-12, 15, 18-19, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Le (US 20150220833 A1, hereinafter referred as “Le”) in view of Quoc Le and Thomas Mikolov, “Distributed Representations of Sentences”, Proceedings of the 31st International Conference on Machine Learning, 2014 (hereinafter referred as “Le and Mikolov”), in further view of Stankiewicz et al. (US 20190065550 A1). 

Regarding claim 3, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein the distribution comprises a random probability distribution of a word vector space.

Nevertheless, Stankiewicz teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a 

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow the document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraph [0040]). 



Regarding claim 4, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data.

	Nevertheless, Stankiewicz teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to .

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).



Regarding claim 7, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 4, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the probability distribution from the raw text data, transmit the probability distribution of the raw text data to the second-party component Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (, Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).

[EXAMINER NOTE: As previously stated in this office action, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).



Regarding claim 8, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations [of claim 7], and the combination further teaches wherein the second-party component is configured to receive the probability distribution from the first-party component, generate the reference text data, transmit the reference text data to the first-party component, receive the generated feature matrix from the first party-component, provide the feature matrix as the input to the one or more machine learning models, and transmit results from the machine learning models to the first-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed; Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And, Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a .

[EXAMINER NOTE: Claim 8 has been objected to because of incorrect dependency stated in the preamble of the claim. Claim 8, as currently drafted, states dependency on claim 9. Examiner has understood this as an in advertent typographical error. Given the limitations recited in the claim, Examiner has interpreted claim 8 as dependent on claim 7.]


Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).


Regarding claim 11, the combination of Le in view of Le and Mikolov teaches all the limitations of claim 9, however, the combination does not distinctly disclose wherein the distribution comprises a random probability distribution of a word vector space.

Nevertheless, Stankiewicz teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow the document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraph [0040]). 


Regarding claim 12, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 9, however the combination does not distinctly disclose wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data.

Nevertheless, Stankiewicz teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).



Regarding claim 15, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 12, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the probability distribution from the raw text data, transmit the probability distribution of the raw text data to the second-party component (Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] further teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (, Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).
 
[EXAMINER NOTE: As previously stated in this office action, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in 



Regarding claim 18, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 16, however, the combination does not distinctly disclose wherein the distribution comprises a random probability distribution of a word vector space.

Nevertheless, Stankiewicz teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word 



Regarding claim 19, the combination of Le in view of Le and Mikolov teaches all of the limitations of claim 16, however, the combination does not distinctly disclose wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data.

	Nevertheless, Stankiewicz teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and 



Regarding claim 22, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 19, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the probability distribution from the raw text data, transmit the probability distribution of the raw text data to the second-party component (Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (, Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).
 
[EXAMINER NOTE: As previously stated in this office action, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for 



Regarding claim 23, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 22, and the combination further teaches wherein the second-party component is configured to receive the probability distribution from the first-party component, generate the reference text data, transmit the reference text data to the first-party component, receive the generated feature matrix from the first party-component, provide the feature matrix as the input to the one or more machine learning models, and transmit results from the machine learning models to the first-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed; Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and Note: the one or more modules reading on a two-party protocol system, as claimed.]; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And, Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as features and feeding these features to conventional machine learning techniques.]).





Regarding claim 24, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising: a processor; a memory; (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s) and memory.)

a reference text data generation component configured to … generate reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from the probability distribution (Le, ; and 

a machine learning component (Le, Paragraph [0023] teaches system includes a neural network system) configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.); and 

provide the feature matrix as an input to one or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning Note: Le and Mikolov, section 2.2, also teaches after being trained, using the paragraph vectors as features and feeding the features to a conventional machine learning technique].).


	Although Le discloses a machine learning component that may exchange data to and from the vector representation system 100 (Le, Paragraph [0023] teaches system includes a neural network system), Le does not distinctly disclose a machine learning component configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data. 

	Nevertheless, Le and Mikolov teaches a machine learning component configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.). 



Although examiner believes that the combination of Le in view of Le and Mikolov substantially teaches the invention, the combination Le does not distinctly disclose …to receive a probability distribution of raw text data… 

Nevertheless, Stankiewicz teaches …to receive a probability distribution of raw text data… (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.), …
	
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and .



14.	Claims 5, 13, 20, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Le (US 20150220833 A1, hereinafter referred as “Le”) in view of Quoc Le and Thomas Mikolov, “Distributed Representations of Sentences”, Proceedings of the 31st International Conference on Machine Learning, 2014 (hereinafter referred as “Le and Mikolov”), in further view of Stankiewicz et al. (US 20190065550 A1), in further view of Xiong et al. (US 20180129938 A1). 

Regarding claim 5, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 4. However the combination does not distinctly disclose wherein the word vector space comprises a pre-trained word2vec embedding space.

Nevertheless, Xiong teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 


Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, as further modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]). 



Regarding claim 13, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 12. However the combination does not distinctly disclose wherein the word vector space comprises a pre-trained word2vec embedding space.

Nevertheless, Xiong teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 






Regarding claim 20, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 19. However the combination does not distinctly disclose wherein the word vector space comprises a pre-trained word2vec embedding space.

Nevertheless, Xiong teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for 



Regarding claim 25, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising: a processor; a memory; (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s) and memory.)

a feature matrix generation component configured to: receive reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from the probability distribution (Le, Paragraph [0028] teaches providing the document representation to a separate system, for ; and …

However, Le does not distinctly disclose generate a feature matrix for the raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data using a document distance measuring technique. 

Nevertheless, Le and Mikolov teaches generate a feature matrix for the raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data using a document distance measuring technique (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 . 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5). 

The combination of Le in view of Le and Mikolov does not distinctly disclose a distribution generation component configured to generate a probability distribution of raw text data, wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space;…

Nevertheless, Stankiewicz teaches: 
a distribution generation component configured to generate a probability distribution of raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document ,…

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).


However, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz does not distinctly disclose … wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space;… 

Nevertheless, Xiong teaches: 
… wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.); … 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, as further modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]). 


14.	Claims 6, 14, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Le (US 20150220833 A1, hereinafter referred as “Le”) in view of Quoc Le and Thomas Mikolov, “Distributed Representations of Sentences”, Proceedings of the 31st International Conference on Machine Learning, 2014 (hereinafter referred as “Le and Mikolov”), in further view of Stankiewicz et al. (US 20190065550 A1), in further view of Franziska Horn, “Context encoders as a simple but powerful extension of word2vec”, June 2017.


Regarding claim 6, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 4. Howerver, the combination wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, as further modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 
	


Regarding claim 14, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 12.  However, the combination wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, as further modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 



Regarding claim 21, the combination of Le in view of Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 19. Howerver, the wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, as further modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 


Prior Art
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
 Mirowski et al. (US 20120150532 A1), disclosing a system and method for feature rich continuous space language models, comprising a word embedding phase and a feature embedding phase, feeding the low dimentional embedding into a linear matrix and a neural network architecture to produce a prediction; claim 8 of Mirowski further disclosing on or more modules to perform the method/process. 
He et al. (US 20180293499 A1), disclosing “receiving a vocabulary, the vocabulary including text data that is provided as at least a portion of raw data, the raw data being provided in a computer-readable file, associating each word in the vocabulary with a feature vector, providing a sentence embedding for each sentence of the vocabulary based on a plurality of feature vectors to provide a plurality of sentence embeddings, providing a reconstructed sentence embedding for each sentence embedding based on a weighted parameter matrix to provide a plurality of reconstructed sentence embeddings, and training the unsupervised neural attention model based on the sentence embeddings and the reconstructed sentence embeddings to provide a trained neural attention model, the trained neural attention model being used to automatically determine aspects from the vocabulary.” 
Soni et al. (US 20180285459 A1), disclosing a framework that extends word2vec and doc2vec (a.k.a., Paragraph Vector), where the representation of words, documents, and tags are simultaneously learned in a joined vector space during training, and employ a distance measuring technique (i.e., k-nearest neighbor search) to predict tags for unseen documents. 
Jaech et al. (US 20180349477 A1), disclosing a tensor-based deep relevance model for search on online social networks, where the system may use word2vec model as the word embedding model. Also, disclosing pre-trained 256-dimensional phrase embeddings using the word2vec package on a large corpus of documents with a vocabulary size of around 2 million tokens containing unigrams and selected bigrams are used 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156.  The examiner can normally be reached on Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 






/B.R.B./sExaminer, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123