DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Status of Claims
Claims 1-20 are currently pending examination. 

Information Disclosure Statement
The information disclosure statements (IDSs) filed 08/14/2020 and 12/14/2020 have been considered.  

Claim Objections
Claims 7, 14, and 20 are objected to because of the following informalities:  
Claims 7, 14, and 20 appear to be missing the term “the” before “RNN”.
Appropriate correction is required.

Claim Interpretation
5.	Claims 2, 9, and 16 recite the following similar and/or same limitations as follows: “wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN); and the optimization further involves training of a plurality of parameters associated with the neural network.” [Emphasis added].
	For purposes of the rejection of claims 2, 9, and 16 (as stated below), Examiner has interpreted the limitation “includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN)…” as the neural network including a convolutional neural network (CNN) or a recurrent neural network (RNN), or both. 

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“a subword vector generator configured for…”– in claim 15.
“a subword vector combiner configured for…” – in claim 15.
“a query/ads model optimization engine configured for…” – in claim 15.

Examiner has identified the structure(s) disclosed in Paragraphs [00102]-[00104] of Applicant’s Specification, as sufficient to perform the recited functions of the subword vector generator, the subword vector combiner, and the query/ads model optimization engine, as claimed. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.




Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


10.	Claims 2, 9, and 16, claims 8 and 15, and claims 3, 10, and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 

Claims 2, 9, and 16 recite the following similar and/or same limitations as follows: “wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN); and the optimization further involves training of a plurality of parameters associated with the neural network.” [Emphasis added]. It is not clear to the Examiner whether Applicant meant to claim a neural network that includes both a convolutional neural network (CNN) and a recurrent neural network (RNN) or whether the neural network includes a CNN or an RNN. Applicant’s disclosure (See, specification at Paragraph [0040] and Figs. 12A-B) would appear to only provide an enabling disclosure of the latter. Hence, for purposes of compact prosecution, Examiner has interpreted claims 2, 9, and 16 as the neural network including a convolutional neural network (CNN) or a recurrent neural network (RNN) or both. See Claim Rejections under 35 U.S.C. 103 further below. Examiner suggests Applicant consider rewording this limitation in claims 2, 9, and 16 as a Markush grouping, selecting from the group of A and B, per MPEP 2117(I). 

Claims 8 and 15 recite “…via the communication platform…”. However, there is insufficient antecedence basis for this limitation in the claim. [Note: For purposes of compact prosecution, Examiner has interpreted the limitation as reading “…via a communication platform…”.

Claims 3, 10, and 17 recite “…each of the advertisements and links…”. However, there is insufficient antecedence basis for this limitation in the claim. [Note: For purposes of compact prosecution, Examiner has interpreted the limitation as referring to “the … hyperlinks”, as previously recited in independent claims 1, 8, and 15.]


The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 17 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claim 17 recites “…, wherein the optimization involves training of an input vector u for each of the plurality of subwords associated with each of the queries, an input vector u for each of the advertisements and links, and a matrix”. Said limitation is already recited in the same language in claim 15, from which claim 17 is dependent.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.


Claim Rejections - 35 USC § 101
13.	Claim 8 and its dependent claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. As per claim 8, the claim per se. Therefore, when the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed.Cir. 2007) (transitory embodiments are not directed to statutory subject matter). Therefore, claim 8 and its dependent claims are non-statutory. A suggestion is made the Applicant to amend the claim to recite non-statutory machine readable medium. 


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3-4, 8, 10-11, 15 and 17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by He et al. (US 20150278200 A1). 

Regarding claim 1, He teaches a method, implemented on a machine having at least one processor, storage, and a communication platform (He, Paragraph [0046]-[0048] teach one or more computing devices, data stores, and at least one computer network.) for obtaining a model for identifying content matching a query, comprising: 
receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks]; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]); 

identifying a plurality of subwords from each of the queries in the training data (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary 

obtaining a plurality of vectors for the plurality of subwords of each of the queries (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword , or an ad.; As stated above, He, Paragraph [0098] teaches the preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.); 

deriving, via a neural network, a vector for each of the queries based on the plurality of vectors for the plurality of subwords of the query (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad.; He, Paragraph [0121] further teaches transforming the first linguistic item into a first concept vector using a deep learning model, such as a convolutional neural network. And, transforming the second linguistic item into a second concept vector using the deep learning model.); and 

obtaining a query/ads model, via optimization with respect to an objective function, based on vectors associated with each of the plurality of subwords of each of the queries and vectors for the queries obtained from the neural network (He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism 902 to iteratively achieve an objective defined an objective function … When the iterative processing is finished, the final parameter values constitute the trained model.”; He, Paragraph [0104] teach training employing gradient-based optimization; He, Abstract, teaches based on transformed first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.; He, Paragraph [0087] further teaches based on the concept vectors derived from the received input queries and one or more other linguistic items, the system can compute several types of similarity measures, such as query-to-keyword concept vectors, query-to-ad concept vectors, ad-part-to-ad-part concept vectors, and so on.; He, Paragraph [0080] teaches “intelligently match incoming queries with appropriate keywords, e.g., to improve the relevance of ads that are presented to the user”).



Regarding claim 3, He teaches all of the limitations of claim 1, and He further teaches wherein the optimization involves training of an input vector u for each of the plurality of subwords associated with each of the queries, an input vector u for each of the advertisements and links, and a matrix (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword , or an ad. The model being produced in a training phase based on clicked-through data; He, Paragraphs [0099] and [0104] teach optimization involving training.; He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads. [Note: clicked ads reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.).



Regarding claim 4, He teaches all of the limitations of claim 1, and He further teaches wherein the plurality of subwords includes at least one type including any of a unigram, k-gram, for k>1, and any combination thereof (He, Paragraph [0098] teaches conversion of queries and documents to their respective letter-trigram window vector forms. [Note: letter-trigram, as disclosed in He, reading on k-gram, for k>1, as  .



Regarding claim 8, He teaches machine readable medium having information recorded thereon for obtaining a model for identifying content matching a query, wherein the information, when read by the machine, causes the machine to perform the following: 
receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks]; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]); 

identifying a plurality of subwords from each of the queries in the training data (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary 

obtaining a plurality of vectors for the plurality of subwords of each of the queries (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword , or an ad.; As stated above, He, Paragraph [0098] teaches the preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.); 

deriving, via a neural network, a vector for each of the queries based on the plurality of vectors for the plurality of subwords of the query (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad.; He, Paragraph [0121] further teaches transforming a first linguistic item into a first concept vector using a deep learning model, such as a convolutional neural network. And, transforming a second linguistic item into a second concept vector using the deep learning model.); and 

obtaining a query/ads model, via optimization with respect to an objective function, based on vectors associated with each of the plurality of subwords of each of the queries and vectors for the queries obtained from the neural network .



Regarding claim 10, He teaches all of the limitations of claim 8, and He further teaches wherein the optimization involves training of an input vector u for each of the plurality of subwords associated with each of the queries, an input vector u for each of the advertisements and links, and a matrix (He, Abstract, teaches Note: clicked ads reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.).



Regarding claim 11, He teaches all of the limitations of claim 8, and He further teaches wherein the plurality of subwords includes at least one type including any of a unigram, k-gram, for k>1, and any combination thereof (He, Paragraph [0098] teaches conversion of queries and documents to their respective letter-trigram window vector forms. [Note: letter-trigram, as disclosed in He, reading on k-gram, for k>1, as  .



Regarding claim 15, He teaches a system for obtaining a model for identifying content matching a query, comprising: 
a subword vector generator configured for receiving, via the communication platform, training data comprising queries, advertisements, and hyperlinks (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads; [Note: clicked ads reading on hyperlinks].; He, Paragraph [0131] teaches input module for receiving various inputs, and further teaches including one or more network interfaces for exchanging data via one or more networks [reading on communication platform as claimed]), identifying a plurality of subwords from each of the queries in the training data (He, Paragraph [0060] teaches a convolution module slides an n-word window across the word sequence to identify a series of word groupings; He, Paragraph [0098] teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary operation comprising conversion of queries and documents to their respective letter-trigram window vector forms.;), and obtaining a plurality of vectors for the plurality of subwords of each of the queries (He, Abstract, teaches transforming first and second ; 

a subword vector combiner configured for deriving, via a neural network, a vector for each of the queries based on the plurality of vectors for the plurality of subwords of the query (He, Abstract, teaches transforming first and second linguistic items into first and second vectors, wherein the first linguistic item may correspond to a query, and a second linguistic item may correspond to a phrase, or a document, or a keyword, or an ad.; He, Paragraph [0121] further teaches transforming a first linguistic item into a first concept vector using a deep learning model, such as a convolutional neural network. And, transforming a second linguistic item into a second concept vector using the deep learning model.); and 

a query/ads model optimization engine configured for obtaining a query/ads model, via optimization with respect to an objective function, based on vectors associated with each of the plurality of subwords of each of the queries and vectors for the queries obtained from the neural network (He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism 902 to iteratively achieve an objective defined an objective function … When the iterative processing is finished, the final parameter values constitute the , wherein the optimization involves training of an input vector u for each of the plurality of subwords associated with each of the queries, an input vector u for each of the advertisements and links, and a matrix (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads [Note: “clicked ads” reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, as expressed in a letter-trigram window vector form. The preliminary .

[EXAMINER NOTE: As stated above, the limitations “a subword vector generator configured for…”,  “a subword vector combiner configured for…”, “a query/ads model optimization engine configured for…” have been deemed to invoke 35 U.S.C. 112(f). Examiner has identified the structure(s) disclosed in Paragraphs [00102]-[00104] of Applicant’s Specification, as sufficient to perform the recited functions of the subword vector generator, the subword vector combiner, and the query/ads model optimization engine, as claimed.].  


Regarding claim 17, He teaches all of the limitations of claim 15, and He further teaches wherein the optimization involves training of an input vector u for each of the plurality of subwords associated with each of the queries, an input vector u for each of the advertisements and links, and a matrix (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0095] teaches click-through data describes queries, keywords, and clicked ads [Note: “clicked ads” reading on hyperlinks]; He, Paragraph [0097] further teaches “click-through data encompasses a plurality of instances of training data…”; He, Paragraph [0098] further teaches, in a preliminary operation, training system operates on the linguistic items in the training set, .



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

21.	Claims 2, 5-6, 9, 12-13, 16, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20150278200 A1) in view of Gao et al. (US 20190114348 A1). 


Regarding claim 2, He teaches all of the limitations of claim 1, and He further teaches wherein the neural network includes one of a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	the optimization further involves training of a plurality of parameters associated with the neural network (He, Paragraph [0094] teaches training system .

	However, He does not teach wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN); 

Nevertheless, Gao teaches: 
wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN) (Gao, Paragraph [0035] and [0140] teach sequence-to-sequence neural network composed of Long-Short-Term Memory (LSTM) units [Note: LSTM is a type of RNN - reading on RNN as claimed]; Gao, Paragraphs [0092]-[0095] teach convolutional neural network (CNN) and convolutional component [Note: reading on CNN as claimed]); and

 the optimization further involves training of a plurality of parameters associated with the neural network (Gao, Paragraph [0040] teaches the training system can train the generator component and the discriminator component in two respective phases. ; Gao, Paragraph [0035] teaches “the generator component is controlled by a set of parameter values…The training framework generates those parameter values through an iterative machine learning process.”; Gao, Paragraph . 

[EXAMINER NOTE: LSTM networks are a type of recurrent neural network (RNN).]. 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, with the convolutional neural network (CNN) and Long-Short-Term Memory (LSTM) network, as taught by Gao, in order to provide faster delivery of content, make more efficient use of system resources, and also produce better matching results. (Gao, Paragraph [0003]).



Regarding claim 5, the combination of He in view of Gao teaches all of the limitations of claim 2, and He further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).



Regarding claim 6, the combination of He in view of Gao teaches all of the limitations of claim 2, and the combination further teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph .

Motivation to combine same as claim 2 (as stated above).


Regarding claim 9, He teaches all of the limitations of claim 8, and He further teaches wherein the neural network includes one of a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	the optimization further involves training of a plurality of parameters associated with the neural network (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0099] teaches “The training system 104 operates by using an .

	However, He does not teach wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN); 

Nevertheless, Gao teaches: 
wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN) (Gao, Paragraph [0035] and [0140] teach sequence-to-sequence neural network composed of Long-Short-Term Memory (LSTM) units [Note: LSTM is a type of RNN - reading on RNN as claimed]; Gao, Paragraphs [0092]-[0095] teach convolutional neural network (CNN) and convolutional component [Note: reading on CNN as claimed]); and

 the optimization further involves training of a plurality of parameters associated with the neural network (Gao, Paragraph [0040] teaches the training system can train the generator component and the discriminator component in two respective phases. ; Gao, Paragraph [0035] teaches “the generator component is controlled by a set of parameter values…The training framework generates those parameter values through an iterative machine learning process.”; Gao, Paragraph [0037] further teaches the discriminator component is defined by another set of parameter values, and a training system successively updates the parameter values to achieve a training objective.; [Note: Gao, [0035] teaches generator component . 

[EXAMINER NOTE: LSTM networks are a type of recurrent neural network (RNN).]. 

Motivation to combine same as claim 2 (as stated above).


Regarding claim 12, the combination of He in view of Gao teaches all of the limitations of claim 9, and He further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).



Regarding claim 13, the combination of He in view of Gao teaches all of the limitations of claim 9, and the combination further teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph [0083] teaches Fig. 5 showing illustrative LSTM unit including an input gate, an ouput gate, a forget gate, and a cell; Gao, Paragraph [0071] teaches recursive neural network includes a chain of processing units… each processing unit outputs a hidden state vector h(t) at a time step t. That hidden state vector constitutes an input to a next processing unit in the chain of processing units…The output vector b constitutes the hidden state vector that is output by the last processing unit of the encoder 402.; Gao, Paragraph [0083]-[0084], disclosing LSTM recurrent transition function encoding hidden/cell states and previous hidden/cell states.).

Motivation to combine same as claim 2 (as stated above).



Regarding claim 16, He teaches all of the limitations of claim 1, and He further teaches wherein the neural network includes one of a convolutional neural network (CNN) … (He, Paragraph [0121] teaches the vectors are obtained using a deep learning model such as a convolutional neural network.); and

 	the optimization further involves training of a plurality of parameters associated with the neural network (He, Paragraph [0094] teaches training system processes a corpus of click-through data to generate the model, wherein the model represents and/or is described by a convolutional matrix and a semantic projection matrix; He, Paragraph [0099] teaches “The training system 104 operates by using an iterative solving mechanism… When the iterative processing is finished, the final parameter values constitute the trained model.”); 

and the plurality of subwords includes at least one type including any of a unigram, k-gram, for k>1, and any combination thereof (He, Paragraph [0098] teaches conversion of queries and documents to their respective letter-trigram window vector forms. [Note: letter-trigram, as disclosed in He, reading on k-gram, for k>1, as claimed; He, Paragraph [0059] teaches alternative whole-word (word-gram) vector [reading on unigram].).


	However, He does not teach wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN); 

Nevertheless, Gao teaches: 
wherein the neural network includes one of a convolutional neural network (CNN) and a recurrent neural network (RNN) (Gao, Paragraph [0035] and [0140] teach sequence-to-sequence neural network composed of Long-Short-Term Memory (LSTM) units [Note: LSTM is a type of RNN - reading on RNN as claimed]; Gao, Paragraphs [0092]-[0095] teach convolutional neural network (CNN) and convolutional component [Note: reading on CNN as claimed]); and the optimization further involves training of a plurality of parameters associated with the neural network (Gao, Paragraph [0040] teaches the training system can train the generator component and the discriminator component in two respective phases. ; Gao, Paragraph [0035] teaches “the generator component is controlled by a set of parameter values…The training framework generates those parameter values through an iterative machine learning process.”; Gao, Paragraph [0037] further teaches the discriminator component is defined by another set of parameter values, and a training system successively updates the parameter values to achieve a training objective.; [Note: Gao, [0035] teaches generator component corresponds to the LSTM network; Gao, [0087] teaches discriminator component includes a CNN or any other type of deep neural network.). 

[EXAMINER NOTE: LSTM networks are a type of recurrent neural network (RNN).]. 

Motivation to combine same as claim 2 (as stated above).


Regarding claim 18, the combination of He in view of Gao teaches all of the limitations of claim 16, and He further teaches wherein the CNN comprises a plurality of layers, each of which comprises a plurality of filters, wherein a first of the plurality of layers takes a plurality of vectors of a plurality of subwords obtained from a query as input and a last of the plurality of layers outputs a vector for the query (He, Paragraph [0032] teaches CNN having a plurality of layers; He, Paragraph [0032] further teaches CNN has convolution matrix [reading on CNN filter – also known as CNN kernel] and semantic projection matrix [performing additional filtering functions]; He, Paragraph [0066] teaches “the convolution module 312 produces a number (T) of letter-trigram window vectors and corresponding LCF vectors, where that number (T) that depends on the number of words in the word sequence 402. Each LCF vector may have a greatly reduced dimensionality compared to its corresponding letter-trigram window vector. [Note: “word sequence” understood to correspond to the input query or document]).



Regarding claim 19, the combination of He in view of Gao teaches all of the limitations of claim 16, and the combination further teaches wherein the RNN comprises a plurality of long-short term memory (LSTM) cells connected in a sequence from a first LSTM cell to a last LSTM cell of the sequence, each of the plurality of LSTM cells has a current state vector and is associated with a transition function which, upon receiving an input, transforms the current state vector to a next state vector (Gao, Paragraph [0035] teaches sequence-to-sequence neural network composed of a recursive arrangement of LSTM units; Gao, Paragraph [0083] teaches Fig. 5 showing illustrative LSTM unit including an input gate, an ouput gate, a forget gate, and a cell; Gao, Paragraph [0071] teaches recursive neural network includes a chain of processing units… each processing unit outputs a hidden state vector h(t) at a time step t. That hidden state vector constitutes an input to a next processing unit in the chain of processing units…The output vector b constitutes the hidden state vector that is output by the last processing unit of the encoder 402.; Gao, Paragraph [0083]-[0084], disclosing LSTM recurrent transition function encoding hidden/cell states and previous hidden/cell states.).

Motivation to combine same as claim 2 (as stated above).

22.	Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over He in view of Gao, in further view of Xin et al. (US 20190065460 A1). 

Regarding claim 7, the combination of He in view of Gao teaches all of the limitations of claim 6, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  Note: generator component, in Gao 0038, comprises the disclosed sequence to sequence neural network composed of LSTM units – reading on using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to modify the method(s) and system(s) for transforming a first linguistic item (corresponding to a query) into a first concept vector, and a second linguistic item (corresponding to a phrase, or a document, or a keyword, or an ad) into a second concept vector for intelligently matching queries to ads, as taught by He, as modified by the convolutional neural network (CNN) and Long-Short-Term Memory (LSTM) network, as taught by Gao, to further include the bidirectional LSTM operation, as disclosed in Xin, in order to overcome drawbacks in the prior art that are difficult to develop and do not scale well by providing a method and system in which named entity recognition (NER) can adapt to new languages and new domains. (Xin, Paragraphs [0016] and [0038]).



Regarding claim 14, the combination of He in view of Gao teaches all of the limitations of claim 19, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  maps a representation of a query and an instance of random information to a key term. [Note: generator component, in Gao 0038, comprises the disclosed sequence to using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Motivation to combine same as claim 7 (as stated above).


Regarding claim 20, the combination of He in view of Gao teaches all of the limitations of claim 6, and the combination further teaches wherein a vector for a query is derived, using RNN (Gao, Paragraph [0038] teaches generator component  maps a representation of a query and an instance of random information to a key term. [Note: generator component, in Gao 0038, comprises the disclosed sequence to sequence neural network composed of LSTM units – reading on using RNN, as claimed]),...

However the combination does not distinctly disclose …. based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query.

Nevertheless, Xin teaches …, based on state vectors associated with the last LSTM cells obtained via a bi-directional operation using a plurality of vectors for a plurality of subwords of the query (Xin, Paragraph [0038] “For a given sentence (x1, x2, . . . , x.sub.n) containing n words, each represented as a d-dimensional vector, an LSTM computes a representation [right arrow over (h)].sub.t of the left context of the sentence. However, the LSTM's hidden state hi takes information only from the past (left), knowing nothing about the future. Thus, generating a representation of the right context as well should add useful information. This can be achieved using a second LSTM that reads the same sequence in reverse. The former may be referred to as the forward LSTM and the latter as the backward LSTM. The two hidden states are concatenated to form the bi-directional LSTM (BLSTM) output [[right arrow over (h)].sub.t, ]. Thus each sequence is presented forward and backward on two separate hidden states to capture past and future information.”; Xin, Paragraph [0036] further teaches the bidirectional LSTM layers may deal with sequential data… given the input vectors, LSTMs return the sequence that represents the sequential information at every step in the input.).

Motivation to combine same as claim 7 (as stated above).

Prior Art
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Ordentlich et al., “Network-Efficient Distributed Word2vec Training System for Large Vocabularies”, CIKM’16, October 24-28, (2016).
Zhai et al., “DeepIntent: Learning Attentions for Online Advertising with Recurrent Neural Networks”, KDD’16 August 13-17, 2016. 
Edizel et al., Deep Character-Level Click-Through Rate Prediction for Sponsored Search”, ACM, 2017. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156.  The examiner can normally be reached on Mon. - Fri. 7:30a.m.-5:00p.m..

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/B.R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116