DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-3 and 8-17 have been amended by Applicant. No claims have been currently added or cancelled. Claims 1-20 are currently pending.

Response to Arguments
Claim Objections
Objection to claims 8 and 15 has been withdrawn in view of Applicant’s amendments to said claims. 
Claim Rejections under 35 U.S.C. 112(b)
 The rejection of claims 1, 8, and 15 (as amended), claims 3, 10, and 17 (as amended) and claims 2, 9, and 16 (as amended) under 35 U.S.C. 112(b) has been maintained herein. 
Claim Rejections under 35 U.S.C.101
The rejection under 35 U.S.C. 101 has been maintained herein. 
Claim Rejections under 35 U.S.C.103
The rejection of claims 1-5, 8-12, and 15-18 under 35 U.S.C. 103 has been maintained herein.
The rejection of claims 6, 13, and 19 under 35 U.S.C. 103 has been maintained herein. 
The rejection of claims 7, 14, and 20 under 35 U.S.C. 103 has been maintained herein.
Applicant's arguments filed 06/29/2022 have been fully considered but they are not persuasive. 
Applicant argues (in pages 15-16 of Applicant’s Remarks) that neither He nor He II suggest deriving a vector for a query by combining all of vectors of the query, thus do not teach or suggest “identifying a plurality of subwords from each of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of the k-grams of words appearing in each of the queries, wherein k>1; obtaining, for each of the plurality of subwords of each of the queries a corresponding vector; deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query, as recited in claim 1. 
	Examiner respectfully disagrees with Applicant’s argument above, as it is directly contradicted by the He II reference itself. To this effect, He II was cited as teaching the argued limitations. Specifically He II, Paragraph [0007] was pointed as teaching a method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] further teaching receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.; and He II, Paragraph [0024] teaches, there can be a recurrent neural network trained on unigrams, another specific to bigrams, and so on. 
	Furthermore, He II, Paragraph [0017] was pointed as teaching receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations.; He II, Paragraph [0019], further teaching “a pre-processing step can be applied to convert the sequences into a format usable by a recurrent neural network, such as a vector format”; and He II, Paragraph [0022] teaching “Continuing the example, the unigrams "Call", "John", and "Smith" may all be in an embedding and be converted into vectors. For the bigrams, there may be an embedding for "John Smith"…
Lastly, He II, Paragraph [0062] has been pointed as teaching the recurrent neural network model uses max-pooling on forward and backward output to combine the results from unigrams, bigrams, and trigrams. The output of the forward and backward network are then concatenated together to become an output of the n-gram of the next layer of the recurrent neural network or the output. In the final layer, the output of each word from the forward and backward recurrent neural network model are concatenated in a single vector.  
	In view of the foregoing the rejection of claim 1 and analogous claims 8 and 15 has been maintained under 35 U.S.C. 103.  
	For the at least same reasons stated for claim 1, the rejection of dependent claims has also been maintained under 35 U.S.C. 103. 


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 8, and 15 (as amended), claims 3, 10, and 17 (as amended), and claims 2, 9, and 16 (as amended) are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 8, and 15 (as amended) are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  Claims 1, 8, and 15 recite the same, or analogous, limitation of “training, via learning, a query/ads model, by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries”. However, it is unclear and/or there appears to be missing steps as to how one would train the claimed query/ads model via learning by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries. Hence, the claims have been rejected under 35 U.S.C. 112(b). 
Claims 3, 10, and 17 (as amended) are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  Claims 3, 10, and 17 (as amended) recite the same, or analogous, limitation of “wherein the training the query/ads model further includes optimizing an input vector u for each of the plurality of subwords associated with each of the queries”. However, it is unclear and/or there appears to be missing steps as to how one would optimize the recited input vectors and matrix. Hence, the claims have been rejected under 35 U.S.C. 112(b).
Claims 2, 9, and 16 (as amended) are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being incomplete for omitting essential steps, such omission amounting to a gap between the steps.  See MPEP § 2172.01.  Claims 2, 9, and 16 (as amended) recite the same, or analogous, limitation of “…a plurality of parameters associated with the neural network are optimized”. However, it is unclear and/or there appears to be missing steps as to how one would “optimize” the “parameters”. Hence, the claims have been rejected under 35 U.S.C. 112(b).


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


16.	Claims 1, 4, 8, 11, and 15 (as amended) are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
	Claims 1, 8, and 15 (as amended) are respectively drawn to a method, a machine readable medium, and a system, hence each falls under one of four categories of statutory subject matter (Step 1). Nonetheless, the claims are directed to a judicial exception of an abstract idea without significantly more. 

	Independent claims 1, 8, and 15 (as amended) recite the same or similar limitations: 
receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks;
identifying a plurality of subwords from each of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1; 
obtaining, for each of the plurality of subwords of each of the queries, a corresponding vector;
deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query; and
training, via learning, a query/ads model, by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries. 

Step 2A, Prong 1: 
The limitation of “identifying a plurality of subwords from each of the queries…”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other that reciting a processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and “via a neural network” (claims 1, 8, 15), nothing in the claim element precludes the step from being practically being performed in the mind and/or with the aid of pen and paper. For example, but for the processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15) language, “identifying a plurality of subwords from each of the queries…” in the context of this claim encompasses a person mentally or with the aid of pen and paper reading a query string and identifying all of the words/characters/word combinations appearing in a query string/advertisement/hyperlink. 
Similarly, the limitation of “obtaining, for each of the plurality of subwords of each of the queries, a corresponding vector”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind with the aid of pen and paper, but for the recitation of generic computer components. For example, but for the but for the processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and and “via a neural network” (claims 1, 8, 15) language, “obtaining, for each of the plurality of subwords of each of the queries, a corresponding vector” in the context of this claim encompasses a person manually with aid of pen and paper identifying all of the single words/characters [unigrams] and all of the word/character combinations [k-grams] appearing in a query string and initializing list for each [i.e., unigrams and k-grams] using vector notation.
Similarly, the limitation of “deriving, via a neural network, a vector for each of the queries, …”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind with the aid of pen and paper, but for the recitation of generic computer components. For example, but for the but for the processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and “via a neural network” (claims 1, 8, 15) language, “deriving, …, a vector for each of the queries,…” in the context of this claim encompasses a person manually with the aid of pen and paper combining/concatenating query strings using vector notation. 
Similarly, the limitation of “training, via learning, a query/ads model, by optimizing vectors…”, as drafted, is a process that, under its broadest reasonable interpretation covers performance of the limitation in the mind with the aid of pen and paper, but for the recitation of generic computer components. For example, but for the but for the processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and “via a neural network” (claims 1, 8, 15) language, “training, via learning, a query/ads model, by optimizing vectors…” in the context of this claim encompasses a person manually with the aid of pen and paper optimizing vectors with respect to a defined cost/objective function.   
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind with the aid or pen and paper, then it falls within the “Mental Process” grouping of abstract ideas. Accordingly, the claims recite an abstract idea. 

	Step 2A, Prong 2: The judicial exception is not integrated into a practical application in particular the claims recite the additional elements - processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and “via a neural network” (claims 1, 8, 15) to perform the limitations/steps listed above. These components in all steps are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(h) – generally linking the use of the judicial exception to a particular technological environment or field of use). Further, the “receiving…” step is recited at a high level of generality and amounts to mere data transmission, which is a form of insignificant extra-solution activity. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Hence, the claims are directed to an abstract idea.

	Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), and “a subword vector generator” [i.e., by a processor], “a subword vector combiner” [i.e., by a processor] and “a query/ads model optimization engine” [i.e., by a processor] (claim 15), and “via a neural network” (claims 1, 8, 15) to perform the limitations/steps listed above amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the “receiving…” step was considered to be extra-solution activity in Step 2A, Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data over a network” is well-understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claims). Thereby, a conclusion that the claimed “receiving…” step is well-understood, routine, conventional activity supported under Berkheimer. Hence, the claims are not patent eligible. 

	Dependent claims 4 and 11 (as amended) are also ineligible for the same reasons given with respect to claims 1, 8, and 15 (as amended). The dependent claims describe additional mental processes: 
wherein each of the k-grams includes one or more consecutive words appearing in each of the queries (claims 4 and 11) (i.e., this limitation does not include an active functional limitation/step and merely further describes what the k-grams can include).

Again, the dependent claims continue to cover the performance of the limitations in the mind as inherited from independent claims 1 and 8 (Step 2A, Prong 1). The dependent claims restating using a processor, storage, a communication platform (claim 1), a machine readable storage medium (claim 8), to perform the limitations of the dependent claims are again no more than generic computer components to apply the exception and do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. (Step 2A, Prong 2; see MPEP 2106.05(h)). Hence, the additional elements in the claims do not amount to significantly more than an abstract idea. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

21.	Claims 1-5, 8-12, and 15-18 (as amended) are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20150278200 A1) in view of He et al. (US 20170286401 A1, hereinafter “He II”). 

Regarding claim 1, He teaches a method, implemented on a machine having at least one processor, storage, and a communication platform (He, Paragraph [0046]-[0048] teach one or more computing devices, data stores, and at least one computer network.) for obtaining a model for identifying content matching a query, comprising: 
receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks]; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]); 

identifying a plurality of subwords from each of the queries in the training data … (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.;) 
…

training, via learning, a query/ads model, by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries (He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism 902 to iteratively achieve an objective defined an objective function … When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teach training employing gradient-based optimization; He, Abstract, teaches based on transformed first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.; He, Paragraph [0087] further teaches based on the concept vectors derived from the received input queries and one or more other linguistic items, the system can compute several types of similarity measures, such as query-to-keyword concept vectors, query-to-ad concept vectors, ad-part-to-ad-part concept vectors, and so on.; He, Paragraph [0080] teaches “intelligently match incoming queries with appropriate keywords, e.g., to improve the relevance of ads that are presented to the user”).

However, He does not distinctly disclose identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1; obtaining, for each the plurality of subwords of each of the queries, a corresponding vector;

deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query;

Nevertheless, He II teaches deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query; (He II, Paragraph [0062] teaches the recurrent neural network model uses max-pooling on forward and backward output to combine the results from unigrams, bigrams, and trigrams. The output of the forward and backward network are then concatenated together to become an output of the n-gram of the next layer of the recurrent neural network or the output. In the final layer, the output of each word from the forward and backward recurrent neural network model are concatenated in a single vector. ); and 


Furthermore, He II teaches identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1 (He II, Paragraph [0007] teaches method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.; He II, Paragraph [0024] teaches, there can be a recurrent neural network trained on unigrams, another specific to bigrams, and so on.);

obtaining, for each the plurality of subwords of each of the queries, a corresponding vector (He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations.; He II, Paragraph [0019], further teaches “a pre-processing step can be applied to convert the sequences into a format usable by a recurrent neural network, such as a vector format”; He II, Paragraph [0022] “Continuing the example, the unigrams "Call", "John", and "Smith" may all be in an embedding and be converted into vectors. For the bigrams, there may be an embedding for "John Smith"…); 


Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, with the input query processing, as taught by He II, in order to provide robust natural language processing functionality with a small memory footprint and with little need for external resources. (He II, Paragraph [0016]).  


Regarding claim 2, the combination of He in view of He II teaches all of the limitations of claim 1, and the combination further teaches wherein the neural network includes a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	a plurality of parameters associated with the neural network are optimized (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism… When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teaches using gradient-based numerical optimization, reading on the limitation as claimed).

wherein the neural network includes … a recurrent neural network (RNN), or both (He II, Paragraph [0024] teaches recurrent neural network); and

a plurality of parameters associated with the neural network are optimized (He II, Paragraphs [0024] and [0047] recurrrent neural network training). 

Motivation to combine same as stated for claim 1.


Regarding claim 3, the combination of He in view of He II teaches all of the limitations of claim 1, and the combination further teaches wherein the training the query/ads model further includes optimizing an input vector u for each of the plurality of subwords associated with each of the queries (He II, Paragraph [0024], “there can be a recurrent neural network specific to unigrams (e.g., trained on unigrams), another specific to bigrams, and so on.”; He, Paragraph [0104] teaches using gradient-based numerical optimization,), an input vector u for each of the advertisements and hyperlinks, and a matrix (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword , or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraphs [0099] and [0104] teach optimization involving training.; He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads. [Note: clicked ads reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”;).

	Motivation to combine same as stated for claim 1.


Regarding claim 4, the combination of He in view of He II teaches all of the limitations of claim 1, and the combination further teaches wherein each of the k-grams includes one or more consecutive words appearing in each of the [queries] (He II, Paragraph [0007] teaches method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.;) .

Motivation to combine same as stated for claim 1.


Regarding claim 5, the combination of He in view of He II teaches all of the limitations of claim 2, and the combination further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).

Motivation to combine same as stated for claim 1.


Regarding claim 8, He teaches non-transitory machine readable medium having information recorded thereon for obtaining a model for identifying content matching a query, wherein the information, when read by the machine, causes the machine to perform the following: 
receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks]; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]); 

identifying a plurality of subwords from each of the queries in the training data … (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.;) 
…

training, via learning, a query/ads model, by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries (He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism 902 to iteratively achieve an objective defined an objective function … When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teach training employing gradient-based optimization; He, Abstract, teaches based on transformed first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.; He, Paragraph [0087] further teaches based on the concept vectors derived from the received input queries and one or more other linguistic items, the system can compute several types of similarity measures, such as query-to-keyword concept vectors, query-to-ad concept vectors, ad-part-to-ad-part concept vectors, and so on.; He, Paragraph [0080] teaches “intelligently match incoming queries with appropriate keywords, e.g., to improve the relevance of ads that are presented to the user”).

However, He does not distinctly disclose identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1; obtaining, for each the plurality of subwords of each of the queries, a corresponding vector;

deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query; 

Nevertheless, He II teaches deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query (He II, Paragraph [0062] teaches the recurrent neural network model uses max-pooling on forward and backward output to combine the results from unigrams, bigrams, and trigrams. The output of the forward and backward network are then concatenated together to become an output of the n-gram of the next layer of the recurrent neural network or the output. In the final layer, the output of each word from the forward and backward recurrent neural network model are concatenated in a single vector.); and 

Furthermore, He II teaches identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1 (He II, Paragraph [0007] teaches method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.; He II, Paragraph [0024] teaches, there can be a recurrent neural network trained on unigrams, another specific to bigrams, and so on.);

obtaining, for each the plurality of subwords of each of the queries, a corresponding vector (He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations.; He II, Paragraph [0019], further teaches “a pre-processing step can be applied to convert the sequences into a format usable by a recurrent neural network, such as a vector format”; He II, Paragraph [0022] “Continuing the example, the unigrams "Call", "John", and "Smith" may all be in an embedding and be converted into vectors. For the bigrams, there may be an embedding for "John Smith"…); 


Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, with the input query processing, as taught by He II, in order to provide robust natural language processing functionality with a small memory footprint and with little need for external resources. (He II, Paragraph [0016]).



Regarding claim 9, the combination of He in view of He II teaches all of the limitations of claim 8, and the combination further teaches wherein the neural network includes a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	a plurality of parameters associated with the neural network are optimized (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism… When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teaches using gradient-based numerical optimization, reading on the limitation as claimed).

	wherein the neural network includes … a recurrent neural network (RNN) or both (He II, Paragraph [0024] teaches recurrent neural network); and

a plurality of parameters associated with the neural network are optimized (He II, Paragraphs [0024] and [0047] recurrent neural network training). 

Motivation to combine same as stated for claim 8.



Regarding claim 10, the combination of He in view of He II teaches all of the limitations of claim 8, and the combination further teaches wherein the training the query/ads model further includes optimizing an input vector u for each of the plurality of subwords associated with each of the queries (He II, Paragraph [0024], “there can be a recurrent neural network specific to unigrams (e.g., trained on unigrams), another specific to bigrams, and so on.”; He, Paragraph [0104] teaches using gradient-based numerical optimization), an input vector u for each of the advertisements and hyperlinks, and a matrix (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword , or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraphs [0099] and [0104] teach optimization involving training.; He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads [Note: clicked ads reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.).

Motivation to combine same as stated for claim 8.



Regarding claim 11, the combination of He in view of He II teaches all of the limitations of claim 8, and the combination further teaches wherein each of the k-grams includes one or more consecutive words appearing in each of the queries (He II, Paragraph [0007] teaches method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.;) .

Motivation to combine same as stated for claim 8.



Regarding claim 12, the combination of He in view of He II teaches all of the limitations of claim 9, and the combination further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).

	Motivation to combine same as stated for claim 8.


Regarding claim 15, He teaches a system for obtaining a model for identifying content matching a query, comprising: 
a subword vector generator implemented by a processor and configured for receiving, via a communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks].; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]), identifying a plurality of subwords from each of the queries in the training data (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.;) …, and …; 

…
a query/ads model optimization engine implemented by a processor and configured for training, via learning, a query/ads model, by optimizing vectors associated with the plurality of subwords of each of the queries and vectors for the queries (He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism 902 to iteratively achieve an objective defined an objective function … When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teaches training employing gradient-based optimization; He, Abstract, teaches based on transformed first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.; He, Paragraph [0087] further teaches based on the concept vectors derived from the received input queries and one or more other linguistic items, the system can compute several types of similarity measures, such as query-to-keyword concept vectors, query-to-ad concept vectors, ad-part-to-ad-part concept vectors, and so on.; He, Paragraph [0080] teaches “intelligently match incoming queries with appropriate keywords, e.g., to improve the relevance of ads that are presented to the user”).

However, He does not distinctly disclose identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1; obtaining, [vectors] for each the plurality of subwords of each of the queries, a corresponding vector;

a subword vector combiner implemented by a processor and configured for deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query; 

Nevertheless, He II teaches a subword vector combiner implemented by a processor and configured for deriving, via a neural network, a vector for each of the queries by combining all of the plurality of corresponding vectors for the plurality of subwords of the query (He II, Paragraph [0062] teaches the recurrent neural network model uses max-pooling on forward and backward output to combine the results from unigrams, bigrams, and trigrams. The output of the forward and backward network are then concatenated together to become an output of the n-gram of the next layer of the recurrent neural network or the output. In the final layer, the output of each word from the forward and backward recurrent neural network model are concatenated in a single vector.); and

Furthermore, He II teaches identifying a plurality of subwords of the queries in the training data, wherein the plurality of subwords include: all of unigrams of words appearing in each of the queries, and all of k-grams of words appearing in each of the queries, wherein k>1 (He II, Paragraph [0007] teaches method for processing an input query wherein the input query comprise a plurality of words; He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations. For example, the query string “Call John Smith” can be broken into unigrams “Call”, “John”, and “Smith”; the bigrams “Call John” and “John Smith”; and the trigram “Call John Smith”.; He II, Paragraph [0024] teaches, there can be a recurrent neural network trained on unigrams, another specific to bigrams, and so on.);

obtaining, for each the plurality of subwords of each of the queries, a corresponding vector (He II, Paragraph [0017] teaches receiving an input as a string [ i.e., query], wherein the input can be split into different representations such as different n-grams (e.g., unigram, bigram, trigram), concept vector sequences, or other representations.; He II, Paragraph [0019], further teaches “a pre-processing step can be applied to convert the sequences into a format usable by a recurrent neural network, such as a vector format”; He II, Paragraph [0022] “Continuing the example, the unigrams "Call", "John", and "Smith" may all be in an embedding and be converted into vectors. For the bigrams, there may be an embedding for "John Smith"…); 


Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, with the input query processing, as taught by He II, in order to provide robust natural language processing functionality with a small memory footprint and with little need for external resources. (He II, Paragraph [0016]).



Regarding claim 16, He teaches all of the limitations of claim 1, and He further teaches wherein the neural network includes a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	a plurality of parameters associated with the neural network are optimized (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism… When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teaches using gradient-based numerical optimization, reading on the limitation as claimed).


	However, He does not teach wherein the neural network includes a recurrent neural network (RNN), or both; 

Nevertheless, He II teaches: 
wherein the neural network includes a recurrent neural network (RNN), or both (He II, Paragraph [0024] teaches recurrent neural network); a plurality of parameters associated with the neural network are optimized (He II, Paragraphs [0024] and [0047] recurrent neural network training). 


Motivation to combine same as stated for claim 15.


Regarding claim 17, the combination of He in view of Gao teaches all of the limitations of claim 15, and the combination further teaches wherein the training the query/ads model further includes optimizing an input vector u for each of the plurality of subwords associated with each of the queries  (He II, Paragraph [0024], “there can be a recurrent neural network specific to unigrams (e.g., trained on unigrams), another specific to bigrams, and so on.”; He, Paragraph [0104] teaches using gradient-based numerical optimization), an input vector u for each of the advertisements and hyperlinks, and a matrix (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads [Note: “clicked ads” reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.).

Motivation to combine same as stated for claim 15.



Regarding claim 18, the combination of He in view of He II teaches all of the limitations of claim 16, and the combination further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).

Motivation to combine same as stated for claim 15.



22.	Claims 6, 13, and 19 (as amended) are rejected under 35 U.S.C. 103 as being unpatentable over He in view of He II, and in further view of Gao et al. (US 20190114348 A1). 


Regarding claim 6, the combination of He in view of He II teaches all of the limitations of claim 2, however, the combination does not distinctly disclose wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector. 

Nevertheless, Gao teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph [0083] teaches Fig. 5 showing illustrative LSTM unit including an input gate, an ouput gate, a forget gate, and a cell; Gao, Paragraph [0071] teaches recursive neural network includes a chain of processing units… each processing unit outputs a hidden state vector h(t) at a time step t. That hidden state vector constitutes an input to a next processing unit in the chain of processing units…The output vector b constitutes the hidden state vector that is output by the last processing unit of the encoder 402.; Gao, Paragraph [0083]-[0084], disclosing LSTM recurrent transition function encoding hidden/cell states and previous hidden/cell states.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, as modified by the input query processing, as taught by He II, to further include the Long-Short-Term Memory (LSTM) network, as taught by Gao, in order to provide faster delivery of content, make more efficient use of system resources, and also produce better matching results. (Gao, Paragraph [0003]).



Regarding claim 13, the combination of He in view of He II teaches all of the limitations of claim 9, however, the combination does not distinctly disclose wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector. 

Nevertheless, Gao teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph [0083] teaches Fig. 5 showing illustrative LSTM unit including an input gate, an ouput gate, a forget gate, and a cell; Gao, Paragraph [0071] teaches recursive neural network includes a chain of processing units… each processing unit outputs a hidden state vector h(t) at a time step t. That hidden state vector constitutes an input to a next processing unit in the chain of processing units…The output vector b constitutes the hidden state vector that is output by the last processing unit of the encoder 402.; Gao, Paragraph [0083]-[0084], disclosing LSTM recurrent transition function encoding hidden/cell states and previous hidden/cell states.).

Motivation to combine same as stated for claim 6.


Regarding claim 19, the combination of He in view of He II teaches all of the limitations of claim 16, however, the combination does not distinctly disclose wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector. 

Nevertheless, Gao teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph [0083] teaches Fig. 5 showing illustrative LSTM unit including an input gate, an ouput gate, a forget gate, and a cell; Gao, Paragraph [0071] teaches recursive neural network includes a chain of processing units… each processing unit outputs a hidden state vector h(t) at a time step t. That hidden state vector constitutes an input to a next processing unit in the chain of processing units…The output vector b constitutes the hidden state vector that is output by the last processing unit of the encoder 402.; Gao, Paragraph [0083]-[0084], disclosing LSTM recurrent transition function encoding hidden/cell states and previous hidden/cell states.).

Motivation to combine same as stated for claim 6.

23.	Claims 7, 14, and 20 (as amended) are rejected under 35 U.S.C. 103 as being unpatentable over He in view of He II and Gao, and in further view of Xin et al. (US 20190065460 A1). 

Regarding claim 7, the combination of He in view of He II and Gao teaches all of the limitations of claim 6, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  maps a representation of a query and an instance of random information to a key term. [Note: generator component, in Gao 0038, comprises the disclosed sequence to sequence neural network composed of LSTM units – reading on using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, as modified by the input query processing, as taught by He II, as modified by the Long-Short-Term Memory (LSTM) network, as taught by Gao, to further include the bidirectional LSTM operation, as disclosed in Xin, in order to overcome drawbacks in the prior art that are difficult to develop and do not scale well by providing a method and system in which named entity recognition (NER) can adapt to new languages and new domains. (Xin, Paragraphs [0016] and [0038]).



Regarding claim 14, the combination of He in view of He II and Gao teaches all of the limitations of claim 13, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  maps a representation of a query and an instance of random information to a key term. [Note: generator component, in Gao 0038, comprises the disclosed sequence to sequence neural network composed of LSTM units – reading on using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Motivation to combine same as stated for claim 7.


Regarding claim 20, the combination of He in view of He II and Gao teaches all of the limitations of claim 19, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  maps a representation of a query and an instance of random information to a key term. [Note: generator component, in Gao 0038, comprises the disclosed sequence to sequence neural network composed of LSTM units – reading on using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Motivation to combine same as stated for claim 7.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123