DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
2.	Claims 1, 7, 8, 9, 15, 16, 22, 23, 24, and 25 have been amended by Applicant. No claims have been currently cancelled or added. Claims 1-25 remain currently pending. 
Response to Arguments
Claim Interpretation under 35 U.S.C. 112(f)
3.	Claim interpretation under 35 U.S.C. 112(f) has been maintained. See pertinent section further below. 
Claim Rejections under 35 U.S.C. 112(b)
4.	The rejection of claims 7-8, 15, and 22-23 has been withdrawn in view of Applicant’s amendment to said claims.
Claim Rejections under 35 U.S.C. 101
5.	The rejection to claims 1-25 (as amended) under 35 U.S.C. 101 has been maintained. 
Applicant's arguments filed 06/15/2022 have been fully considered but they are not persuasive. 
Applicants have amended claims 1, 9, 16, 24, and 25 to similarly recite the limitation provide the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being input to improve the processing efficiency of the one or more machine learning models.
However, the added limitation is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(h)). Furthermore the phrase to improve the processing efficiency of the one or more machine learning models has been understood as intended use language carrying no patentable weight. 

Claim Rejection under 35 U.S.C. 103
The rejection of claims 1-5, 7-13, and 15-20, and 22-25 under 35 U.S.C. 103 has been maintained.
The rejection of claims 6, 14, and 21 under 35 U.S.C. 103 has been maintained.
Applicant’s arguments with respect to claim 1 (as amended) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
In claims 7-8, 15, and 22-23 – “first-party component” and “second-party component”.
In claim 24 – “reference text data generation component”.
In claim 24 – “machine learning component”.
In claim 25 – “distribution generation component”.
In claim 25 – “feature matrix generation component”.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1-25 (as amended) are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
	Claims 1, 9, 16, 24, and 25 (as amended) are respectively drawn to a method, a computer program product comprising a computer readable storage medium, and systems, hence each falls under one of four categories of statutory subject matter (Step 1). Nonetheless, the claims are directed to a judicially recognized exception of an abstract idea without significantly more.
	Independent claims 1, 9, and 16 (as amended) recite the following same or analogous, limitations: 
generating, by a processor system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random text sequences, the minimum length being a first value, the maximum length being a second value greater than the first value, and wherein the random words of each text sequence in the set are drawn from a random probability distribution of raw text data derived from a pre- trained word vector space; 
generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data; and 
providing, by the processor system, the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being input to improve the processing efficiency of the one or more machine learning models.

Step 2A Prong 1: 
The limitation of “generating…reference text data…”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind and/or with the aid of pen and paper but for the recitation of generic computer components. That is, other than reciting “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16], nothing in the claim elements precludes the step from practically being performed in the mind. For example, but for the “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16] language, “generating…reference text data…” in the context of this claim encompasses a person manually generating random sentences or paragraphs where the random words of said generated sentences or paragraphs are chosen from a random probability distribution of raw text data. 
Similarly, the limitation of “generating…a feature matrix…”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind and/or with the aid of pen and paper but for the recitation of generic computer components. That is, other than reciting “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16], nothing in the claim elements precludes the step from practically being performed in the mind. For example, but for the “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16] language, “generating…a feature matrix…” in the context of this claim encompasses a person mentally or manually generating a matrix of numeric values, characters, or words [i.e., features] wherein those values/characters/words are obtained by applying any distance metric (e.g., Euclidean distance) that measures character or word distances to manually calculate distances between the set of random text sequences and the raw text data.
If a claim limitation under its broadest reasonable interpretation, covers performance of the limitation in the mind (and/or with the aid of pen and paper) but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas. Accordingly, the claims recite an abstract idea. 

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular the claims recite the additional elements -  “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16] to perform the limitations/steps listed above. These components in all steps are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(h)). Further, the “providing, …, the feature matrix…” step is recited at a high level of generality and amounts to no more that mere data transmission, which is a form of extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Hence, the claims are directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor system, computer readable storage medium and/or one or more processors to perform the limitations/steps listed above amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the “providing, …, the feature matrix…” step was considered to be extra-solution activity in Step 2A, Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claims). Thereby, a conclusion that the claimed “providing, …, the feature matrix…” step is well-understood, routine, conventional activity supported under Berkheimer. Hence, the claims are not patent eligible.

Independent system claim 24 (as amended) recites the following limitations: 
a reference text data generation component configured to receive a random probability distribution of raw text data, and to generate reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in 6Response to Office Action Dated August 10, 2021 Application No.: 15/689,799Docket No.: P201701495US01the set are drawn from the random probability distribution of raw text data derived from a pre-trained word vector space; and 
a machine learning component configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data; and 
provide the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being input to improve the processing efficiency of the one or more machine learning models.


Step 2A, Prong 1:
The limitation “…generate reference text data…”, as drafted, is a process that under its broadest reasonable interpretation covers performance of the limitation in the mind and/or with the aid of pen and paper but for the recitation of generic computer components. That is, other than reciting “a processor”, “a memory”, “a reference text data generation component”, and “a machine learning component” [i.e., machine-executable component(s) embodied within machine(s), (e.g., embodied in one or more computer readable mediums], nothing in the claim elements precludes the step from practically being performed in the mind (and/or with the aid of pen and paper). For example, but for the “processor”, “memory”, “reference text data generation component”, or “machine learning component” language, “…generate reference text data…” in the context of this claim encompasses a person manually generating random sentences or paragraphs where the random words of said generated sentences or paragraphs are chosen from a random probability distribution of raw text data.
Similarly, the limitation “wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind and/or with the aid of pen and paper but for the recitation of generic computer components. That is, other than reciting “a processor”, “a memory”, “a reference text data generation component”, and “a machine learning component” [i.e., machine-executable component(s) embodied within machine(s), (e.g., embodied in one or more computer readable mediums] nothing in the claim elements precludes the step from practically being performed in the mind. For example, but for the “processor”, “memory”, “reference text data generation component”, or “machine learning component” language, “wherein the feature matrix is generated …” step in the context of this claim encompasses a person mentally or manually generating a matrix of numeric values, characters, or words [i.e., features] wherein those values/characters/words are obtained by applying any distance metric (e.g., Euclidean distance) that measures character or word distances to manually calculate distances between the set of random text sequences and the raw text data.
If a claim limitation under its broadest reasonable interpretation, covers performance of the limitation in the mind (and/or with the aid of pen and paper) but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular the claims recite the additional elements - “a processor”, “a memory”, “a reference text data generation component”, and “a machine learning component” [i.e., machine-executable component(s) embodied within machine(s), (e.g., embodied in one or more computer readable mediums] to perform the limitations/steps listed above. These components in all steps are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (i.e., generally linking the use of the judicial exception to a particular technological environment – see MPEP 2106.05(h)). Further, the “receive…” and “provide…” steps are recited at a high level of generality and amounts to no more than mere data transmission, which is a form of extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Hence, the claims are directed to an abstract idea. 

	Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor, a memory, a reference text data generation component and a machine learning component to perform the limitations/steps listed above amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the “receive…” and “provide…” steps were considered to be extra-solution activity in Step 2A, Prong 2 and thus it is re-evaluated in Step 2B to determine if it is more than what is well understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data over a network” is a well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claims). Thereby, a conclusion that the claimed “receive…” and “provide…” steps is well-understood, routine, conventional activity supported under Berkheimer. Hence, the claims are not patent eligible.


Independent system claim 25 (as amended) recites the following limitations: 
a distribution generation component configured to generate a random probability distribution of raw text data, wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space; and 
a feature matrix generation component configured to: receive reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, wherein each random length is sampled from a minimum length to a maximum length, and wherein the random words of each text sequence in the set are drawn from the probability distribution; and
generate a feature matrix for the raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data using a document distance measuring technique.
provide a feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being input to improve the processing efficiency of the one or more machine learning models.

Dependent claims 2-8, 10-15, and 16-23 (as amended) are also ineligible for the same reasons given with respect to claims 1, 9, and 16. The dependent claims describe additional mental processes: 
mentally, or with the aid of pen and paper, computing, …, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique; and concatenating, by the system, the feature vectors to generate the feature matrix (claims 2, 10, 17) (e.g., mentally or manually generating a set of vectors of numeric values, characters, or words [i.e., features] wherein those values/characters/words are obtained by applying any distance metric [e.g., Euclidean distance] that measures character or word distances to manually calculate distances between the set of random text sequences and the raw text data. Mentally or manually linking/joining together the values/characters/words from the vectors to generate the feature matrix).
wherein the distribution comprises a random probability distribution of a word vector space (claims 3, 11, 18) (i.e., these claims do not recite an active functional limitation/step and merely describe what the distribution comprises).
wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (claims 4, 12, 19) (i.e., these claims do not recite an active functional limitation/step and merely describe what the distribution comprises).
wherein the word vector space comprises a pre-trained word2vec embedding space (claims 5, 13, 20) (i.e., these claims do not recite an active functional limitation/step and merely describe what the word vector space comprises).
wherein the word vector space comprises a trained word2vec embedding space (claims 6, 14, 21) (i.e., these claims do not recite an active functional limitation/step and merely describe what the word vector space comprises).
generate the probability distribution from the raw text data (i.e., merely a mathematical calculation step – falls under the mathematical concept grouping of abstract ideas) and mentally, or with the aid of pen and paper generate the feature matrix based at least in part on the set of random text sequences (claims 7, 15, 22) (e.g., as inherited from step in claims 1, 9, and 16 - encompasses a person mentally or manually generating a matrix of numeric values, characters, or words [i.e., features] wherein those values/characters/words are obtained by applying any distance metric (e.g., Euclidean distance) that measures character or word distances to manually calculate distances between the set of random text sequences and the raw text data).
wherein the second-party component is configured to receive the probability distribution from the first-party component, generate the reference text data, transmit the reference text data to the first-party component, receive the generated feature matrix from the first party-component, provide the feature matrix as the input to the one or more machine learning models, and transmit results from the machine learning models to the first-party component (claims 8, 23) (e.g., generate the reference text data is a mental process as inherited from claims 1 and 16 - encompasses a person manually generating random sentences or paragraphs where the random words of said generated sentences or paragraphs are chosen from a random probability distribution of raw text data).

Again, the dependent claims continue to cover the performance of the limitations in the mind as inherited from independent claims 1, 9, and 16 (Step 2A, Prong 1). The dependent claims restating “by a processor system” [claim 1], “a computer readable storage medium” [claim 9], and “one or more processors” [claim 16], to perform the steps of the dependent claims are again no more than generic computer components to apply the exception and do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Similarly, the dependent claims 7, 8, 15, 22, and 23 reciting a “first-party component” and a “second-party component” to perform the steps in said claims are also no more than generic computer components to apply the exception and do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Further, the steps of “transmit…”, “receive…”, and “provide…”, in claims 7, 8, 15, 22, and 23, are recited at a high level of generality and amounts to no more than mere data transmission, which is a form of insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. (Step 2A, Prong 2; see MPEP 2106.05(h)). Hence the additional elements in the claims do not amount to significantly more than an abstract idea. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor system, a computer readable storage medium, one or more processors, a first-party component, and a second-party component to perform the limitations/steps listed above amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the steps of “transmit…”, “receive…”, and “provide…”, in claims 7, 8, 15, 22, and 23, were considered to be extra-solution activity in Step 2A, Prong 2, and thus it is re-evaluated in Step 2B to determine they are more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data over a network” is well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claims). Thereby, a conclusion that the claimed “transmit…”, “receive…”, and “provide…” steps (in claims 7, 8, 15, 22, and 23) are well-understood, routine, conventional activity is supported under Berkheimer. Hence, the dependent claims 2-8, 10-15, and 16-23 (as amended) are not patent eligible because they do not amount to significantly more than an abstract idea no provide an inventive concept. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

15.	Claims 1-5, 7-13, and 15-20, and 22-25 are rejected under 35 U.S.C. 103 as being unpatentable over Le (US  A1, hereinafter referred as “Le”) in view of Stankiewicz et al. (US 20190065550 A1), in further view of Xiong et al. (US 20180129938 A1), and in further view of Quoc Le and Thomas Mikolov, “Distributed Representations of Sentences”, Proceedings of the 31st International Conference on Machine Learning, 2014 (hereinafter referred as “Le and Mikolov”). 

Regarding claim 1, Le teaches a computer-implemented method for performing unsupervised feature representation learning for text data comprising:
generating, by a processor system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, …, and … (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer [Note: Le, [0038] interpreted as word embedding].); 

…; and 

providing, by the processor system, the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being the input provided to improve processing efficiency of the one or more machine learning models. (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system, e.g., a logistic regression system, a support vector machines system, or k-means system; wherein the conventional machine learning system may be configured to receive the document representation of the input document, and generate a score representing the estimated likelihood that the document is about or related to the corresponding topic. [Note: reading on configured to infer a relationship between the input of the feature matrix and and output.).

However, Le does not distinctly disclose 
wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value
and wherein the random words of each text sequence in the set are drawn from a random probability distribution of the raw text data …. 

Nevertheless, Le and Mikolov teach wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value (Le and Mikolov, section 2.3 teaches words randomly sampled from the paragraph; Le and Mikolov, section 3.3 teaches randomly sampled paragraph)…

The combination does not but Stankiewicz does teaches wherein the random words of each text sequence in the set are drawn from a random probability distribution of raw text data …(Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).

However, the combination does not distinctly disclose … raw text data derived from a pre-trained word vector space.

Nevertheless, Xiong teaches … raw text data derived from a pre-trained word vector space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model to obtain a fixed word embedding of each word in a document and a question, and ,in other implementations, to generate character embeddings and/or phrase embeddings.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]).

	Examiner believes that Le teaches or at least implies generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

 Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as modified by the initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, to further include the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5).


	Regarding claim 2, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches teaches all of the limitations of claim 1, and the combination further teaches wherein generating the feature matrix includes:

computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close together the document vector representations are to each other – reading on and/or otherwise implying using a document distance measuring technique, as claimed above.]

Motivation to combine same as stated for claim 1. 



Regarding claim 3, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 1, and the combination further teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

	Motivation to combine same as stated for claim 1. 



Regarding claim 4, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 1, and the combination further teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.).

Motivation to combine same as stated for claim 1.



Regarding claim 5, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 4, and the combination further teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 

Motivation to combine same as stated for claim 1. 


Regarding claim 7, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov, in further view of Stankiewicz teaches all of the limitations of claim 4, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the probability distribution from the raw text data, transmit the random probability distribution of the raw text data to the second-party component (Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] further teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).
 
[EXAMINER NOTE: As previously stated, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Motivation to combine same as stated for claim 1. 



Regarding claim 8, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov, teaches all of the limitations of claim 7, and the combination further teaches wherein the second-party component is configured to receive the random probability distribution from the first-party component, generate the reference text data, transmit the reference text data to the first-party component, receive the generated feature matrix from the first party-component, provide the feature matrix as the input to the one or more machine learning models, and transmit results from the machine learning models to the first-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed; Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And, Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as features and feeding these features to conventional machine learning techniques.]).

Motivation to combine same as stated for claim 1. 



Regarding claim 9, Le teaches a computer program product for performing unsupervised feature representation learning for text data, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor system to cause the processor system to perform a method (Le, Paragraph [0006] teaches computer programs recorded on one or more computer storage devices configured to perform the actions of the methods; Le, Paragraph [0054] teaches “one or more modules of computer program instructions encoded on a tangible non tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus”.) comprising: 

generating, by a processor system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, …, and … (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer [Note: Le, [0038] interpreted as word embedding].); 

…; and 

providing, by the processor system, the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being the input provided to improve processing efficiency of the one or more machine learning models. (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system, e.g., a logistic regression system, a support vector machines system, or k-means system; wherein the conventional machine learning system may be configured to receive the document representation of the input document, and generate a score representing the estimated likelihood that the document is about or related to the corresponding topic. [Note: reading on configured to infer a relationship between the input of the feature matrix and and output.).

However, Le does not distinctly disclose 
wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value
and wherein the random words of each text sequence in the set are drawn from a random probability distribution of the raw text data …. 

Nevertheless, Le and Mikolov teach wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value (Le and Mikolov, section 2.3 teaches words randomly sampled from the paragraph; Le and Mikolov, section 3.3 teaches randomly sampled paragraph)…


The combination does not but Stankiewicz does teaches wherein the random words of each text sequence in the set are drawn from a random probability distribution of raw text data …(Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).

However, the combination does not distinctly disclose … raw text data derived from a pre-trained word vector space.

Nevertheless, Xiong teaches … raw text data derived from a pre-trained word vector space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model to obtain a fixed word embedding of each word in a document and a question, and ,in other implementations, to generate character embeddings and/or phrase embeddings.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]).

	Examiner believes that Le teaches or at least implies generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the processor system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as modified by the initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, to further include the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5).


Regarding claim 10, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 9, and the combination further teaches wherein generating the feature matrix includes:

computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close together the document vector representations are to each other – reading on and/or otherwise implying using a document distance measuring technique, as claimed above.]

Motivation to combine same as stated for claim 9. 



Regarding claim 11, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all the limitations of claim 9, and the combination further teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

	Motivation to combine same as stated for claim 9. 



Regarding claim 12, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov teaches all of the limitations of claim 9, and the combination further teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.).

Motivation to combine same as stated for claim 9.


Regarding claim 13, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov teaches all of the limitations of claim 12, and the combination further teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 

	Motivation to combine same as stated for claim 9. 


Regarding claim 15, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov, teaches all of the limitations of claim 12, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the random probability distribution from the raw text data, transmit the random probability distribution of the raw text data to the second-party component (Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] further teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (, Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).
 
[EXAMINER NOTE: As previously stated, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Motivation to combine same as stated for claim 9.



Regarding claim 16, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising one or more processors configured to perform a method (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s).) comprising: 
generating, by the system, reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, …, and … (Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer [Note: Le, [0038] interpreted as word embedding].); 

…; and 

providing, by the system, the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being the input provided to improve processing efficiency of the one or more machine learning models. (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system, e.g., a logistic regression system, a support vector machines system, or k-means system; wherein the conventional machine learning system may be configured to receive the document representation of the input document, and generate a score representing the estimated likelihood that the document is about or related to the corresponding topic. [Note: reading on configured to infer a relationship between the input of the feature matrix and and output.).

However, Le does not distinctly disclose 
wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value
and wherein the random words of each text sequence in the set are drawn from a random probability distribution of the raw text data …. 

Nevertheless, Le and Mikolov teach wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value (Le and Mikolov, section 2.3 teaches words randomly sampled from the paragraph; Le and Mikolov, section 3.3 teaches randomly sampled paragraph)…

The combination does not but Stankiewicz does teaches wherein the random words of each text sequence in the set are drawn from a random probability distribution of raw text data …(Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).

However, the combination does not distinctly disclose … raw text data derived from a pre-trained word vector space.

Nevertheless, Xiong teaches … raw text data derived from a pre-trained word vector space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model to obtain a fixed word embedding of each word in a document and a question, and ,in other implementations, to generate character embeddings and/or phrase embeddings.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]).

	Examiner believes that Le teaches or at least implies generating, by the system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.). However said limitation is more clearly and distinctly taught by Le and Mikolov as provided below.

	Le and Mikolov teaches generating, by the system, a feature matrix for raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.);

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as modified by the initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, to further include the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5).


Regarding claim 17, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov teaches teaches all of the limitations of claim 16, wherein generating the feature matrix includes: computing, by the processor system, a set of feature vectors between the raw text data and the set of random text sequences using a document distance measuring technique (Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries); and 

concatenating, by the system, the feature vectors to generate the feature matrix (Le, Paragraphs [0009] and [0040] teach concatenating vector representations of the words in the sequence with the vector representations of the input document to generate a combined representation.;).

	[EXAMINER NOTE: Le, Paragraph [0013] teaches using a vector representation of a document as a “feature” of the document, wherein the vector representations may allow for identification of semantically similar documents by examining how close together the document vector representations are to each other – reading on and/or otherwise implying using a document distance measuring technique, as claimed above.]

Motivation to combine same as stated for claim 16. 



Regarding claim 18, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov teaches all of the limitations of claim 16, and the combination further teaches wherein the distribution comprises a random probability distribution of a word vector space (Stankiewicz, Paragraph [0040] teaches using NPL algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings to learn a weighted vector space over documents.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)

	Motivation to combine same as stated for claim 9. 


Regarding claim 19, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 16, and the combination further teaches wherein the distribution comprises a probability distribution of a word vector space generated from the raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.).

Motivation to combine same as stated for claim 16.


Regarding claim 20, the combination of Le in view of Stankiewicz, Xion, and Le and Mikolov teaches all of the limitations of claim 19, and the combination further teaches wherein the word vector space comprises a pre-trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.). 

Motivation to combine same as stated for claim 16.



Regarding claim 22, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov, teaches all of the limitations of claim 19, and the combination further teaches wherein the processor system comprises a two-party protocol system comprising a first-party component and a second-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed), wherein the first-party component is configured to generate the random probability distribution from the raw text data, transmit the random probability distribution of the raw text data to the second-party component (Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] further teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]), receive the reference text data from the second-party component, generate the feature matrix based at least in part on the set of random text sequences (Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; ), and transmit the generated feature matrix to the second-party component (Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as feature and feeding these features to conventional machine learning techniques.]).
 
[EXAMINER NOTE: As previously stated, claims 7-8, 15, and 22-23 have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraphs [0063]-[0069] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 

Motivation to combine same as stated for claim 16.



Regarding claim 23, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov, teaches all of the limitations of claim 22, and the combination further teaches wherein the second-party component is configured to receive the random probability distribution from the first-party component, generate the reference text data, transmit the reference text data to the first-party component, receive the generated feature matrix from the first party-component, provide the feature matrix as the input to the one or more machine learning models, and transmit results from the machine learning models to the first-party component (Le, Figure 1, Vector Representation System 100 and Neural Network System 110 read on first and second party components, as claimed; Le, Paragraph [0028] and [0035] teaches generating document vector representation for a given document; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer; Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.; Stankiewics, Paragraph [0057] teaches method disclosed may utilize one or more modules executing on a computer or computers and such modules may be separated or combined. [Note: the one or more modules reading on a two-party protocol system, as claimed.]; Le, Paragraphs [0009] and [0040] teach concatenating vector representations … to generate a combined representation.; And, Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features…; Le, Paragraph [0028] further teaches providing the document representation to a separate system for some immediate purpose, for example, as input to a conventional machine learning system.; [Note: Le and Mikolov further discloses, in section 2.2, using paragraphs vectors as features and feeding these features to conventional machine learning techniques.]).

Motivation to combine same as stated for claim 16. 



Regarding claim 24, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising: a processor; a memory; (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s) and memory.)

a reference text data generation component configured to … generate reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, …, and …(Le, Paragraph [0028], teaches generating document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer. [Note: Le, [0038] interpreted as word embedding]); and 

a machine learning component (Le, Paragraph [0023] teaches system includes a neural network system) configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.); and 

provide the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being the input provided to improve the processing efficiency of the on or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system , e.g., a logistic regression system, a support vector machines system, or k-means system; wherein the conventional machine learning system may be configured to receive the document representation of the input document, and generate a score representing the estimated likelihood that the document is about or related to the corresponding topic. [Note: reading on configured to infer a relationship between the input of the feature matrix and and output.].; [Note: Le and Mikolov, section 2.2, also teaches after being trained, using the paragraph vectors as features and feeding the features to a conventional machine learning technique].).

However the Le does not distinctly disclose: 
wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from the raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value
…to receive random a probability distribution of raw text data… 
…, and wherein the random words of each text sequence in the set are drawn from the random probability distribution of raw text data derived from a pre-trained word vector space;


Nevertheless, Le and Mikolov teaches wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from the raw text data in order to result in the set of random sequences, the minimum length being a first value, the maximum length being a second value greater than the first value (Le and Mikolov, section 2.3 teaches words randomly sampled from the paragraph; Le and Mikolov, section 3.3 teaches randomly sampled paragraph)

The combination does not but Stankiewicz teaches: 
…to receive a random probability distribution of raw text data… 
…, and wherein the random words of each text sequence in the set are drawn from the random probability distribution of raw text data… (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.)
	
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).

	However, the combination does not distinctly disclose … random probability distribution of raw text data derived from a pre-trained word vector space;

Nevertheless, Xiong teaches … random probability distribution of raw text data derived from a pre-trained word vector space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model to obtain a fixed word embedding of each word in a document and a question, and ,in other implementations, to generate character embeddings and/or phrase embeddings.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include the initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]).

	Although Le discloses a machine learning component that may exchange data to and from the vector representation system 100 (Le, Paragraph [0023] teaches system includes a neural network system), Le does not distinctly disclose a machine learning component configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data. 

	Nevertheless, Le and Mikolov teaches a machine learning component configured to: receive a feature matrix for the raw text data, wherein the feature matrix is generated based at least in part on a set of computed distances between the set of random text sequences and the raw text data (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.). 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as further modified with initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, to further include the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5). 

[EXAMINER NOTE: As previously stated, the limitations reference text data generation component and machine learning component, in claim 24, have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraph [0064] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 


Regarding claim 25, Le teaches a system for performing unsupervised feature representation learning for text data, the system comprising: a processor; a memory; (Le, Paragraph [0020], [0023], and [0024] teaches system comprising one or more computers; Le, Paragraphs [0055] and [0058] further disclose processor(s) and memory.)

a feature matrix generation component configured to: receive reference text data comprising a set of random text sequences, wherein each text sequence of the set of random text sequences is of a random length and comprises a number of random words, …, and wherein the random words of each text sequence in the set are drawn from the probability distribution (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system.; Le, Paragraph [0028], generated document vector representation for a given document; Le, Paragraph [0035] teaches system processes multiple word sequences from a document to determine the document vector representation and further teaches each of the sequences is a fixed length and the system can apply a sliding window to the document to extract each possible sequence of a predetermined fixed length from the document.; Le, Paragraph [0038] teaches system maps each of the words of a sequence to a respective word vector representation using the embedding layer.; [Note: Le, [0038] interpreted as word embedding]); and …

and provide the feature matrix as an input to one or more machine learning models, wherein the one or more machine learning models are configured to infer a relationship between the input of the feature matrix and an output, the feature matrix being the input provided to improve processing efficiency of the one or more machine learning models (Le, Paragraph [0028] teaches providing the document representation to a separate system, for example, using the document representation as a feature of an input document which can be provided as input to a conventional machine learning system, e.g., a logistic regression system, a support vector machines system, or k-means system; wherein the conventional machine learning system may be configured to receive the document representation of the input document, and generate a score representing the estimated likelihood that the document is about or related to the corresponding topic. [Note: reading on configured to infer a relationship between the input of the feature matrix and and output.).


However, Le does not distinctly disclose 
wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from the raw text data in order to result in a set of random text sequences, the minimum length being a first value, the maximum length being a second value greater than the first value

a distribution generation component configured to generate a random probability distribution of raw text data, wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space;…

Nevertheless Le and Mikolov teaches wherein the generating the reference text data comprises sampling random lengths from a minimum length through a maximum length from the raw text data in order to result in a set of random text sequences, the minimum length being a first value, the maximum length being a second value greater than the first value (Le and Mikolov, section 2.3 teaches words randomly sampled from the paragraph; Le and Mikolov, section 3.3 teaches randomly sampled paragraph)

The combination does not but Stankiewicz teaches: 
a distribution generation component configured to generate random a probability distribution of raw text data (Stankiewicz, Paragraph [0040] teaches NLP algorithms such as Latent Dirichlet Allocation (LDA) and word embeddings, use unstructured text [including raw text] to learn a weighted vector space.; Stankiewicz, Paragraph [0045] further teaches LDA learns a probability distribution over terms such that a document may then be represented as a likelihood distribution over topics based on terms in the document.),…

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, with the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, in order to allow a document to be understood by a machine and to easily compare documents for similarities and differences. (Stankiewicz, Paragraphs [0040]-[0041]).

However, the combination does not distinctly disclose … wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space;… 

Nevertheless, Xiong teaches: 
… wherein the probability distribution of raw text data is generated based at least in part on a pre-trained or trained word2vec embedding space (Xiong, Paragraph [0038] teaches initializing word embeddings using pre-trained word2vec word embedding model.); … 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, to further include initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, in order to obtain a fixed word embedding of each word in a document and a question. (Xiong, Paragraph [0038]). 

However, the combination not distinctly disclose generate a feature matrix for the raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data using a document distance measuring technique. 

Nevertheless, Le and Mikolov teaches generate a feature matrix for the raw text data based at least in part on a set of computed distances between the set of random text sequences and the raw text data using a document distance measuring technique (Le and Mikolov, section 2.1, teaches “every word is mapped to a unique vector, represented by a column in a matrix. The column is indexed by position of the word in the vocabulary. The concatenation or the sum of the vectors is then used as features for prediction on the next word in a sentence.”; Le and Mikolov, section 3.3 further teaches to test vector representations of paragraphs, computing distances between paragraphs of the same query and/or paragraphs of different queries.). 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as further modified with initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, to further include the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, in order to predict surrounding words in contexts sampled from paragraphs with a framework that has the potential to overcome may weaknesses of bag-of-words models. (Le and Mikolov, section 3.3 and section 5).

[EXAMINER NOTE: As previously stated, the limitations feature matrix generation component and distribution generation component, in claim 25, have been understood to invoke 35 U.S.C. 112(f). Examiner has identified the structure and algorithms described in Paragraph [0064] of Applicant’s specification as sufficient to perform the functional limitations recited in the claim(s).] 


16.	Claims 6, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Le in view of Stankiewicz, Xiong, and Le and Mikolov, and in further view of Franziska Horn, “Context encoders as a simple but powerful extension of word2vec”, June 2017.


Regarding claim 6, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 4. However, the combination does not distinctly disclose wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as further modified with initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 
	


Regarding claim 14, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 12.  However, the combination does not distinctly disclose wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as further modified with initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 
 


Regarding claim 21, the combination of Le in view of Stankiewicz, Xiong, and Le and Mikolov teaches all of the limitations of claim 19. Howerver, the combination does not distinctly disclose wherein the word vector space comprises a trained word2vec embedding space.

Nevertheless, Horn teaches wherein the word vector space comprises a trained word2vec embedding space (Horn, Abstract and page 1, col.2, ¶ 2, teaches using “trained word2vec embeddings” - reading on the limitation as claimed.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the methods, system and computer programs for generating document representations taught by Le, as modified by the teachings of using NLP algorithms such as LDA and word embeddings over structured and unstructured text data, as taught by Stankiewicz, as further modified with initializing word embeddings using pre-trained word2vec word embedding model, as taught by Xiong, as modified by the teachings of concatenating paragraphs vectors and word vectors via a matrix, as taught by Le and Mikolov, to further include the teachings of multiplying trained word2vec embeddings with a word’s average context vector, as taught by Horn, in order to allow for easy creation of out-of-vocabulary embeddings as well as a better representation of words with multiple meanings. (Horn, Abstract and page 1, col.2, ¶ 1-2). 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123