DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the application and claims filed 3/28/2019.
Claims 1-20 are pending and have been examined.

Information Disclosure Statement
Acknowledgment is made of the information disclosure statements filed 3/28/2019 and 7/15/2020, which comply with 37 CFR 1.97. As such, the information disclosure statements have been placed in the application file and the information referred to therein has been considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(3) because Figures 4-8 and 11 include letters which do not measure at least .32 cm. (1/8 inch) in height (see, e.g., the subscript characters in elements 404 and 702, the lowercase characters in elements 408, 506, 606, 802, 804 and 806 in FIGs. 4-8, and some characters in elements 1106, 1108 and 1110, and text “GAME” and “SERVER” at the bottom of FIG. 11). See MPEP 507 (A) and 37 CFR 1.84(p)(3): Numbers, letters, and reference characters must measure at least .32 cm. (1/8 inch) in height.
The drawings are also objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference characters not mentioned in the description: 
Reference character 1006 shown in Figure 10 is not found in the detailed description.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities:
Reference character 1006 shown in Figure 10 is not described in applicant’s specification (see, e.g., paragraphs 76 and 78 describing FIG. 10). Appropriate correction is required.
In the last sentence of paragraph 53, non-patent literature “Wen Li, et al., "Approximate Nearest Neighbor Search on High Dimension Data - Experiments, Analyses, and Improvement,”arXiv: 1610.02455vl [cs.DB], October 8, 2016, 26 pages.” is referred to. The listing of references in the specification is not a proper information disclosure statement. 37 CFR 1.98(b) requires a list of all patents, publications, or other information submitted for consideration by the Office, and MPEP § 609.04(a) states, "the list may not be incorporated into the specification but must be submitted in a separate paper." Therefore, unless the reference has been cited by the examiner on form PTO-892, it has not been considered. It is noted, however, that applicant appears to have furnished a copy of this reference in the above-referenced information disclosure statement filed 3/28/2019 (see, e.g., NPL 3).
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words. It is important that the abstract not exceed 150 words in length since the space provided for the abstract on the computer tape used by the printer is limited. The form and legal phraseology often used in patent claims, such as "means" and "said," should be avoided. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, "The disclosure concerns," "The disclosure defined by this invention," "The disclosure describes," etc. 
The abstract of the disclosure is also objected to because the first sentence is grammatically incorrect and appears to be missing one or more words between “TF-modifying” and “to modify” in the phrase “using a TF-modifying to modify the term-specific frequency information…”. Based on the penultimate sentence of the abstract, which reads “Both the TF-modifying component and the projection component can use respective machine-trained neural networks.”, it appears the word “component” is missing between “TF-modifying” and “to modify” in the first sentence of the abstract. Correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, 5, 7-8 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Trepess et al. (U.S. Patent Application Pub. No. 2006/0095852 A1, hereinafter “Trepess”) in view of Dua et al. (U.S. Patent Application Pub. No. 2020/0019642 A1, hereinafter “Dua”). 
With respect to claim 1, Trepess discloses the invention as claimed including One or more computing devices for processing an instance of text (see, e.g., paragraphs 14 and 39-40, “an information retrieval apparatus for searching a set of information items and displaying the results of the search, the information items each having a set of characterising information features”, “an information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data”, “The storage system operates in two general modes of operation. In a first mode, a set of information items (e.g. textual information items) is assembled on the disk storage 30 or on a network disk drive connected via the network 50 and is sorted and indexed ready for a searching operation.” [i.e., an apparatus/computer/computing device for processing textual information/an instance of text], comprising:
hardware logic circuitry, the hardware logic circuitry including: (a) one or more hardware processors that perform operations by executing machine-readable instructions stored in a memory, and/or (b) one or more other hardware logic units that perform operations using a task-specific collection of logic gates (see, e.g., paragraphs 14 and 39 and claims 29-30, “an information retrieval apparatus for searching a set of information items … The apparatus comprises a search processor operable to search the information items … A mapping processor operable to generate data representative of a map of information items from a set of information items identified in the search” [i.e., hardware logic circuitry of the apparatus includes search and mapping hardware processors/logic units that perform operations], “an information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data” [i.e., the computer/computing device include a processor/hardware logic circuitry/unit and programs that are executable machine-readable instructions stored in a memory/disk storage 30], “Computer software having program code for carrying out a method”, “medium for providing program code” [i.e., software/program code includes executable machine-readable instructions stored in a memory/medium for performing operations/carrying out the method]), the operations including:
receiving an instance of input text in response to an action taken by a user using a user computing device (see, e.g., paragraphs 15, 18-19 and 85, “The present invention addresses a technical problem of defining a search query for search information items and for refining a search for information items, which particularly advantageous for searching a [sic – and] navigating large amounts of data”, “features may be combined to form a search query in accordance with the Boolean operators specified the user.”, “a graphical user interface … for forming a search query … the conditions for the search are specified by Boolean operators. Accordingly a user may specify a search query in accordance with the information items elected in different rows of the interface”, “user then initiates the search, for example by pressing enter on the keyboard 70 or by using the mouse 80 to select a screen ‘button’ to start the search” [i.e., receiving search information items responsive to an action taken by a user using a computing device]);
generating an input term-frequency (TF) vector that includes frequency information relating to frequency of occurrence of terms in the input text, the input TF vector corresponding to a … vector that includes a dimension for each term in the input text (see, e.g., paragraphs 49, 53, 60 and 62, “Feature extraction is the process of transforming raw data into an abstract representation. These abstract representations can then be used for processes such as pattern classification, clustering and recognition. In this process, a so-called ‘feature vector’ is generated, which is an abstract representation of the frequency of terms used within a document”, “The size of the feature vector, and so the dimension of the term frequency histogram, is reduced”, “a term frequency histogram is generated for each document in the set … by counting the number of times words present in the dictionary (pertaining to that document set) occur within an individual document”, “a histogram may plot the frequency of over 50000 different terms, giving the histogram a dimension of over 50000” [i.e., generating an input term frequency feature vector including frequency information relating to frequency of occurrence/number of times terms/words occur in the input text/data and a dimension for each term in the text]);
using a TF-modifying … to modify the frequency information in the input TF vector, associated with respective terms, by respective machine-trained weighting factors, to produce an intermediate vector, the TF-modifying … being implemented by the hardware logic circuitry (see, e.g., paragraphs 64, 67, 73-74 and 77, “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’”, “The size of the feature vector, and so the dimension of the term frequency histogram, is reduced” [i.e., using term-frequency/TF modifying to modify the frequency information in the TF vector, producing an intermediate, reduced dimension feature vector], “values in the feature vectors being used to train the map”, “initially each of these weights is set to a random value, and then, through an iterative process, the weights are ‘trained’. The map is trained by presenting each feature vector to the input nodes of the map.”, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document” [i.e., the system/apparatus performs training and modifies the frequency information in the TF vector based on machine-trained weights/weighting factors to produce an intermediate vector with reduced dimensions after the training phase]) … ;
using a projection … to project the intermediate vector into a … vector having a dimensionality that is less than a dimensionality of the input TF vector, the … vector providing a distributed compact representation of semantic information in the input text, the projection … being implemented by the hardware logic circuitry (see, e.g., paragraphs 11, 43, 66-67 and 70-71, “The n-value vectors are then mapped onto smaller dimensional vectors (i.e. vectors having a number of values m) … which is substantially less than n … by multiplying the vector by an (nxm) ‘projection matrix’ formed of an array of random numbers … to generate vectors of smaller dimension where any two reduced-dimension vectors have much the same vector dot product as the two respective input vectors”, ““information may be stored in a distributed manner, for example at various sites across the internet”, “Semantic Indexing-a technique whereby the dimension of the histogram is reduced by looking for groups of terms that have a high probability of occurring simultaneously in documents” [i.e., distributed, compact/reduced dimension representation of semantic information], “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’ … reducing the dimension of the histogram by multiplying it by a matrix of random numbers”, “mapping is not perfect, but suffices for the purposes of characterising the content of a document in a compact way.” [i.e., a compact representation of input information], Once feature vectors have been generated for the document collection, thus defining the collection's information space, they are projected into a two-dimensional SOM [self-organizing map] at a step 150 to create a semantic map. … process of mapping to 2-D by clustering the feature vectors” [i.e., the system/apparatus uses a projection to project the intermediate feature vector into a vector with less dimensions/2-D and providing a semantic map/distributed compact representation of semantic information in the input text]) … ;
utilizing the … vector to produce an output result (see, e.g., paragraphs 77-78 and 94-95, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document”, “presenting the feature vector for each document to the map to see where it lies yields an x, y map position for each document. These x, y positions … can be used to visualize the relationship between documents”, “FIGS. 7, 8 and 9 provide example illustrations of how information items are searched with respect to a search query and how the results of the search are displayed”, “search processor 404 may be arranged to search the information items and to generate search results, which identify information items which correspond to a search query. The mapping processor 412 may then receive data representing the results of the search identifying information items corresponding to the search query. The mapping processor then generates the x, y co-ordinates of the positions in the array corresponding to the identified information items.” [i.e., using the reduced dimension vector to produce the search results/output result]); and
providing the output result to an output device of the user computing device (see, e.g., paragraphs 14 and 85, “an information retrieval apparatus for searching a set of information items and displaying the results of the search”, “key words in the search enquiry area 250 are then compared with the information items … This generates a list of results, each of which is shown as a respective entry 280 in the list area 260. Then the display area 270 displays display points corresponding to each of the result items.”).
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing the input TF vector corresponding to an n-hot vector that includes a dimension for each term in the input text and
using a TF-modifying neural network to modify the frequency information in the input TF vector … the TF-modifying neural network being implemented by the hardware logic circuitry and including at least one layer of neurons; 
using a projection neural network to project the intermediate vector into an embedding vector … the projection neural network … including at least one layer of neurons and
utilizing the embedding vector to produce an output result.
In the same field, analogous art Dua teaches the input TF vector corresponding to an n-hot vector that includes a dimension for each term in the input text (see, e.g., paragraphs 41 and 46, “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector … the values of the vector slots in the BoN [bag-of ngrams] vector are calculated as the TF-IDF of the corresponding ngram” [i.e., an input term frequency/TF vector], “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings, such as one-hot vector representations of the words in which the vector comprises a slot for every word in the vocabulary” [i.e., the TF vector corresponds to an one/n-hot vectors including one dimension for each word/term in the text]);
using a TF-modifying neural network to modify the frequency information in the input TF vector … the TF-modifying neural network being implemented by the hardware logic circuitry and including at least one layer of neurons (paragraphs 90, 107 and 109 of applicant’s specification repeat the claim language and paragraphs 102 and 109 of applicant’s specification state “the TF-modifying neural network and the projection neural network operate based on a machine-trained model produced by a training environment.” Therefore, “a TF-modifying neural network”, under the broadest reasonable interpretation (BRI), in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 43 and 48-49, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120, which is encoded … The generator G 100 and discriminator D 130 may be implemented as software and/or hardware logic of one or more computing devices” [i.e., the TF-modifying neural network/generator neural network G being implemented by the hardware logic circuitry], “Each row [projected z; n-gram embedding] of the concatenation matrix 116 is input to a neural network (NN) 118, which in some illustrative embodiments may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer [i.e., the neural network includes multiple layers of perceptrons/neurons] … the neural network will output |V| numbers, which can be interpreted as a vector G(z), that represents the output BoN … the output vector G(z) represents a probability distribution over the ngrams of the vocabulary 108. In one illustrative embodiment, this is a probability distribution based on TF-IDF values of the ngrams.” [i.e., generator neural network/G modifies frequency information in the TF vector], “the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text” [i.e., using the generator neural network/G to modify/compute the TF vector]); 
using a projection neural network to project the intermediate vector into an embedding vector … the projection neural network … including at least one layer of neurons (see, e.g., paragraphs 43, 45-46 and 48-50, “discriminator neural network (discriminator) D 130 receives as input an encoded bag-of-ngrams 132, which may be the bag-of-ngrams G(z) 120, for example, or a bag-of-ngrams obtained from an actual (true) portion of natural language text, and outputs an output value D(G(z))”, “The projection is a fully connected neural network layer followed by a non-linearity such as the ReLU”, “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings”, “a neural network (NN) … may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer” [i.e., a projection neural network/D includes at least one layer of perceptrons/neurons], “in the discriminator D 130, an input of a BoN model G(z) 132 is received, which may be the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text.” [i.e., bag of ngrams/BoN includes an intermediate vector], “the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112.” [i.e., the projection neural network/D projects the intermediate vector into an embedding matrix/vector]); and
utilizing the embedding vector to produce an output result (see, e.g., paragraphs 50 and 54-55, “After retrieving the ngram embeddings 110 and generating the matrix B 112, logic of the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112. It should be noted that after this multiplication operation, the embeddings of ngrams that are not in the input encoded BoN 132 will all have zero values. The result of this multiplication is projected by discriminator projection logic 140 to generate an output matrix 142.”, “concatenated matrix 222 comprises rows having a first portion corresponding to the corresponding rows in the matrix Z 214, a second portion corresponding to the embeddings vector 210 of the question q 204, and a third portion corresponding to the n-gram embedding matrix 218 of the vocabulary 220. The neural network 224 processes the concatenation matrix 222 in a similar manner as previously described above, to generate the output BoN”, “After retrieving the ngram embeddings 218 and generating the n-gram embedding matrix 218, logic of the discriminator 230 multiplies the encoded ngram in the BoN model G(z, q) by the ngram embeddings in the vector matrix … The result of this multiplication is projected by discriminator projection logic 234 to generate an output matrix” [i.e., using the embedding vector to generate/produce an output matrix/result]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 

Regarding claim 3, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein at least one machine-trained weighting factor applied to a term by the TF-modifying neural network is negative, which represents a negative emphasis on the term.
In the same field, analogous art Dua teaches wherein at least one machine-trained weighting factor applied to a term by the TF-modifying neural network is negative, which represents a negative emphasis on the term (as indicated above, the “TF-modifying neural network”, under the BRI, in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 52 and 126, “The proposed GAN [generative adversarial network] based architecture … can be easily adapted to generate bag-of ngrams conditioned in other texts, e.g., other questions, or classes, e.g., sentiment classes positive/negative. … the conditional GAN may be trained where the generator also receives as input the ‘class’ of the BoN.”, “the operation of the neural network in the generator is modified, e.g., weights are modified, to attempt to improve the BoN output … If the output is not correct, a determination is made as to whether training is complete” [i.e., a machine-trained weight/weighting factor applied to a term by the TF-modifying/generator neural network represents a negative emphasis on the term – negative sentiment or an incorrect output/query-question answer]).
The motivation to combine Trepess and Dua is the same as discussed above with respect to claim 1.

Regarding claim 5, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein said utilizing comprises:
finding one or more candidate items, each of which has a candidate embedding vector having a prescribed relation to the embedding vector associated with the input text in a vector space,
wherein the output result conveys information regarding said one or more candidate items to the user.
In the same field, analogous art Dua teaches wherein said utilizing comprises:
finding one or more candidate items, each of which has a candidate embedding vector having a prescribed relation to the embedding vector associated with the input text in a vector space (see, e.g., paragraphs 46 and 56-57, “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings”, “the vector representation 210 of the word embedding of the input question 204. The vector r 240 is given as input to a neural network 242” [i.e., embedding vector associated with input text/question in a vector space], “The BoN model G(z, q) 232 is used to select a candidate answer to the input question q 204 generated by a QA system. … the BoN model 232 for the answer to the input question q 204 may be used to compare the ngrams in the BoN model 232 to the ngrams in a candidate answer.” [i.e., finding/selecting a candidate answer/item that has a candidate embedding vector related to/with a prescribed relation to the embedding vector associated with the input text/question q in a vector space]),
wherein the output result conveys information regarding said one or more candidate items to the user (see, e.g., paragraphs 83-84 and 88, “Content users input questions to cognitive system which implements the QA pipeline. The QA pipeline then answers the input questions using the content in the corpus of data by evaluating documents, sections of documents, portions of data in the corpus, or the like. … a response is provided containing one or more answers to the question.”, “QA pipeline receives an input question, parses the question to extract the major features of the question, uses the extracted features to formulate queries, and then applies those queries to the corpus of data. Based on the application of the queries to the corpus of data, the QA pipeline generates a set of … candidate answers to the input question”, “the cognitive system 300 provides a response to users in a ranked list of candidate answers/responses” [i.e., the output result/response conveys information regarding the candidate answers to the user]).
The motivation to combine Trepess and Dua is the same as discussed above with respect to claim 1.

Regarding claim 7, as discussed above, Trepess in view of Dua teaches the device of claim 5.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein each candidate item corresponds to a candidate product described by the input text.
In the same field, analogous art Dua teaches wherein each candidate item corresponds to a candidate product described by the input text (see, e.g., paragraphs 62-63, 110 and 83-84, “a first request processing pipeline may be trained to operate on input requests (or questions) directed to a medical domain, while a second request processing pipeline may be trained to operate on input request, e.g., input questions, directed to a financial domain”, “each request processing pipeline may have their own associated corpus or corpora that they ingest and operate on, e.g., one corpus for medical treatment documents and another corpus for financial domain related documents”, “a first corpus may be associated with healthcare documents while a second corpus may be associated with financial documents… while another corpus may be IBM Redbooks” [i.e., input test/question describes/corresponds to a medical or healthcare product, a financial product or a technical book product], “Content users input questions to cognitive system which implements the QA pipeline … a response is provided containing one or more answers to the question.”, “QA pipeline receives an input question, parses the question to extract the major features of the question, uses the extracted features to formulate queries, and then applies those queries to the corpus of data. Based on the application of the queries to the corpus of data, the QA pipeline generates a set of … candidate answers to the input question” [i.e., each candidate answer/item corresponds to a candidate financial/medical/healthcare/book product described by the input text/question]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 

Regarding claim 8, as discussed above, Trepess in view of Dua teaches the device of claim 5.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein each candidate item corresponds to a candidate product that is complementary to a product described by the input text.
In the same field, analogous art Dua teaches wherein each candidate item corresponds to a candidate product that is complementary to a product described by the input text (paragraphs 45 and 97 of applicant’s specification repeat the claim language and paragraphs 55-56 of applicant’s specification state “recommendation engine 504 to find one or more candidate items that are considered complementary to a product described by the input text. For example, assume that the input text describes a book title ‘Visiting Venice on a Budget.’ The recommendation engine 504 identifies an image 506 of a book that is considered related to the book described by the input text, although not the same book as described by the input text.” and “component 128 deems a query item to be related to a candidate positive item when they pertain to complementary items, but not the same item.” Therefore, “a candidate product that is complementary to a product described by the input text” under the BRI, in light of the specification, is any product or service related or pertaining to an item described in an input text, question or query) (see, e.g., paragraphs 82, 84 and 110, “cognitive systems provide mechanisms for answering questions posed to these cognitive systems using a Question Answering pipeline or system (QA system) and/or process requests which may or may not be posed as natural language questions. The QA pipeline or system … answers questions pertaining to a given subject-matter domain … may include any file, text, article, or source of data for use in the QA system. For example, a QA pipeline accesses a body of knowledge about the domain, or subject matter area, e.g., financial domain, medical domain, legal domain, etc.”, “Based on the application of the queries to the corpus of data, the QA pipeline generates a set of … candidate answers to the input question”, “a first corpus may be associated with healthcare documents while a second corpus may be associated with financial documents… while another corpus may be IBM Redbooks” [i.e., each candidate answer/item corresponds to a candidate pertaining to a financial/medical/healthcare/book product described by the input text/question]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
 
Regarding claim 12, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Trepess further discloses another … vector that provides a distributed compact representation of input information (see, e.g., paragraphs 40 and 70-73, “information may be stored in a distributed manner, for example at various sites across the internet”, “mapping is not perfect, but suffices for the purposes of characterising the content of a document in a compact way.” [i.e., a distributed, compact representation of input document information], “Once feature vectors have been generated for the document collection, thus defining the collection's information space, they are projected into a two-dimensional SOM [self-organizing map] at a step 150 to create a semantic map. … process of mapping to 2-D by clustering the feature vectors”, “Self-Organising map is used to cluster and organise the feature vectors that have been generated for each of the documents.”, “self-organising map consists of input nodes 170 and output nodes 180 in a two-dimensional array or grid of nodes illustrated as a two-dimensional plane 185. There are as many input nodes as there are values in the feature vectors” [i.e., feature vectors include another vector that provides a map of nodes/distributed compact representation of input information/input nodes]).
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing combining the embedding vector with another embedding vector … to produce a combined vector; and performing analysis based on the combined vector.
In the same field, analogous art Dua teaches combining the embedding vector with another embedding vector … to produce a combined vector (see, e.g., paragraphs 54-55, “concatenated matrix 222 comprises rows having a first portion corresponding to the corresponding rows in the matrix Z 214, a second portion corresponding to the embeddings vector 210 of the question q 204, and a third portion corresponding to the n-gram embedding matrix 218 of the vocabulary 220.” [i.e., concatenating/combining the embedding vector 210 with another embedding matrix 218/vector to produce concatenated matrix 222/a combined vector]); and
performing analysis based on the combined vector (see, e.g., paragraph 55, “The neural network 224 processes the concatenation matrix 222 … to generate the output BoN model G(z,q) 226 representing the probability distribution over the ngrams of the vocabulary 220 conditioned by … the embeddings vector 210 of the input question.” [i.e., processing/performing analysis based on the concatenated matrix 222/ combined vector]).
The motivation to combine Trepess and Dua is the same as discussed above with respect to claim 1.

Claims 2, 4 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Trepess in view of Dua as applied to claim 1 above, and further in view of Keller et al. (U.S. Patent Application Pub. No. 2019/0294972 A1, hereinafter “Keller”).
Regarding claim 2, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein the TF-modifying neural network applies a … weighting.
In the same field, analogous art Dua teaches wherein the TF-modifying neural network applies a … weighting (as indicated above, the “TF-modifying neural network”, under the BRI, in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 43, 48-49, 116 and 126, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120, which is encoded”, “the neural network will output |V| numbers, which can be interpreted as a vector G(z), that represents the output BoN … the output vector G(z) represents a probability distribution over the ngrams of the vocabulary 108. In one illustrative embodiment, this is a probability distribution based on TF-IDF values of the ngrams.” [i.e., generator neural network/G modifies frequency information in the TF vector], “the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text” [i.e., the generator neural network/G modifies/computes the TF vector], “process involves applying weights to the various scores, where the weights have been determined through training of the statistical model employed by the QA pipeline 500 and/or dynamically updated. For example, the weights for scores generated by algorithms that identify exactly matching terms and synonym may be set relatively higher”, “then the operation of the neural network in the generator is modified, e.g., weights are modified” [i.e., the neural network applies a weighting to scores]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach that a neural network applies a diagonal weighting matrix.
	In the same field, analogous art Keller teaches a neural network applies a diagonal weighting matrix (aside from repeating the claim language in paragraph 4, applicant’s specification states “multiplying the input TF vector 114 by a diagonal weighting matrix 120 of size g x g. This diagonal matrix includes weighting factors associated with respective dimensions of the input TF vector 114 along its diagonal, and 0 values at other positions” in paragraph 35. Therefore, “a diagonal weighting matrix”, under the BRI, in light of the specification, is any matrix of weights or weighting factors with weights/weighting values along a diagonal) (see, e.g., paragraphs 23 and 199-201, “input data is processed by the ANN [artificial neural network] to produce output data. … the input data may include one or more of image data, textual data”, “Before training, the weights of an artificial neural network need to be initialized. … by training a neural network to approximate uniform distribution before learning from data.”, “For recurrent artificial neural networks, starting with a uniformly scaled diagonal unit matrix for the hidden-to-hidden weights may work.”, “This configuration is sufficiently uniformly distributed to allow for convergence during training” [i.e., a neural network applies a diagonal matrix of weights and is trained]).
Trepess, Dua and Keller are analogous art because they are each related to systems and techniques for processing textual information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-14 and 40, Dua, paragraphs 56 and 66-67, and Keller, paragraphs 22-23 and claim 22). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Keller to provide techniques for “representing an artificial neural network utilizing individual paths each connecting an input of the ANN to an output of the ANN” where “the weights of an artificial neural network [are] initialized” by “starting with a uniformly scaled diagonal unit matrix for the hidden-to-hidden weights” (See, e.g., Keller, Abstract and paragraphs 199-200). Doing so would have allowed Trepess in view of Dua to use Keller’s ANN techniques so that “complexity of the ANN may be reduced, and the ANN may be trained and implemented in a much faster manner when compared to an implementation using fully connected ANN graphs” by “training a neural network to approximate uniform distribution before learning from data” in order to avoid “unpredictable output during the initial training phase”, as suggested by Keller (See, e.g., Keller, Abstract and paragraphs 199-200). 

Regarding claim 4, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Dua further discloses wherein the projection neural network is a fully-connected neural network (see, e.g., paragraphs 45 and 50, “The projection is a fully connected neural network layer”, “The projection can be seen as a fully connected neural network”). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess is not relied on for explicitly disclosing a fully-connected neural network that applies a full weighting matrix.
In the same field, analogous art Keller teaches a fully-connected neural network that applies a full weighting matrix (aside from repeating the claim language in paragraph 4, paragraph 38 of applicant’s specification states “the full matrix 126 includes non-zero weighting factors interspersed throughout its rows and columns, not just in the diagonal positions.” Therefore, “a full weighting matrix”, under the BRI, in light of the specification, is any matrix of weights or weighting factors with non-zero weights/weighting values that are not solely in diagonal positions) (see, e.g., paragraphs 119, 151-152, 201 and 203, “Artificial neural networks may contain so-called fully connected layers, where each neural unit of a layer is connected to every neural unit of the next layer.” [i.e., a fully-connected neural network], “There is a strong connection between neural units and projection methods. Given a half space H+ defined by a normal vector w and a perpendicular distance from the origin b, the projection of a point x onto that half space is given”, “This is illustrated in FIG. 7, which is an interpretation 700 of projections onto halfspaces as ReLU neural units.” [i.e., a projection neural network], “initializing the weight matrices with a small positive constant works nicely when we subsample the artificial neural network … Hence the weights of the selected connections are initialized to a constant, for example, the inverse of the number of connections of a neural unit in order to be normalized … Thus the corresponding weight vector in a fully connected layer would consist of randomly or quasi-randomly chosen indices with zero and non-zero values. This configuration is sufficiently uniformly distributed”, “Given a trained artificial neural network, it is simple to compress this network to use weights in { -1, 0, 1} … paths are sampled backwards proportional to the weights of each neural unit visited along the path. This corresponds to importance sampling paths in a weighted graph.” [i.e., the neural network applies full weight matrices/vectors – with non-zero values that are not solely in diagonal positions]).

With respect to independent claim 18, Trepess discloses the invention as claimed including a method, implemented by one or more computing devices, for processing an instance of text (see, e.g., paragraphs 1 and 39-40, and claim 16, “This invention relates to information retrieval apparatus and methods”, “an information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data”, “The storage system operates in two general modes of operation. In a first mode, a set of information items (e.g. textual information items) is assembled on the disk storage 30 or on a network disk drive connected via the network 50 and is sorted and indexed ready for a searching operation.”, “A method of searching a set of information items and displaying the results of the search, the information items each having a set of characterising information features” [i.e., a method implemented by a system including a computer/computing device for processing textual information/an instance of text]), comprising:
receiving an instance of input text in response to an action taken by a user using a user computing device (see, e.g., paragraphs 15, 18-19 and 85, “The present invention addresses a technical problem of defining a search query for search information items and for refining a search for information items, which particularly advantageous for searching a [sic – and] navigating large amounts of data”, “features may be combined to form a search query in accordance with the Boolean operators specified the user.”, “a graphical user interface … for forming a search query … the conditions for the search are specified by Boolean operators. Accordingly a user may specify a search query in accordance with the information items elected in different rows of the interface”, “user then initiates the search, for example by pressing enter on the keyboard 70 or by using the mouse 80 to select a screen ‘button’ to start the search” [i.e., receiving search information items responsive to an action taken by a user using a computing device]);
generating an input term-frequency (TF) vector that includes frequency information relating to frequency of occurrence of terms in the input text, the input TF vector corresponding to a … vector that includes a dimension for each term in the input text (see, e.g., paragraphs 49, 53, 60 and 62, “Feature extraction is the process of transforming raw data into an abstract representation. These abstract representations can then be used for processes such as pattern classification, clustering and recognition. In this process, a so-called ‘feature vector’ is generated, which is an abstract representation of the frequency of terms used within a document”, “The size of the feature vector, and so the dimension of the term frequency histogram, is reduced”, “a term frequency histogram is generated for each document in the set … by counting the number of times words present in the dictionary (pertaining to that document set) occur within an individual document”, “a histogram may plot the frequency of over 50000 different terms, giving the histogram a dimension of over 50000” [i.e., generating an input term frequency feature vector including frequency information relating to frequency of occurrence/number of times terms/words occur in the input text/data and a dimension for each term in the text]);
using a TF-modifying … to modify the frequency information in the input TF vector, associated with respective terms, by respective machine-trained weighting factors, to produce an intermediate vector, the TF-modifying … being implemented by said one or more computing devices (see, e.g., paragraphs 39, 64, 67, 73-74 and 77, “information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data” [i.e., implemented by the computer/computing device], “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’”, “The size of the feature vector, and so the dimension of the term frequency histogram, is reduced” [i.e., using term-frequency/TF modifying to modify the frequency information in the TF vector, producing an intermediate, reduced dimension feature vector], “values in the feature vectors being used to train the map”, “initially each of these weights is set to a random value, and then, through an iterative process, the weights are ‘trained’. The map is trained by presenting each feature vector to the input nodes of the map.”, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document” [i.e., the system performs training and modifies the frequency information in the TF vector based on machine-trained weights/weighting factors to produce an intermediate vector with reduced dimensions after the training phase]) … ;
using a projection … to project the intermediate vector into a … vector having a dimensionality that is less than a dimensionality of the input TF vector, the … vector providing a distributed compact representation of semantic information in the input text, the projection … being implemented by said one or more computing devices (see, e.g., paragraphs 11, 39, 66-67 and 71, “The n-value vectors are then mapped onto smaller dimensional vectors (i.e. vectors having a number of values m) … which is substantially less than n … by multiplying the vector by an (nxm) ‘projection matrix’ formed of an array of random numbers … to generate vectors of smaller dimension where any two reduced-dimension vectors have much the same vector dot product as the two respective input vectors”, “information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data” [i.e., implemented by the computer/computing device], “Semantic Indexing-a technique whereby the dimension of the histogram is reduced by looking for groups of terms that have a high probability of occurring simultaneously in documents”, “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’ … reducing the dimension of the histogram by multiplying it by a matrix of random numbers”, “Once feature vectors have been generated for the document collection, thus defining the collection's information space, they are projected into a two-dimensional SOM [self-organizing map] at a step 150 to create a semantic map. … process of mapping to 2-D by clustering the feature vectors” [i.e., the system uses a projection to project the intermediate feature vector into a vector with less dimensions/2-D and providing a semantic map/distributed compact representation of semantic information in the input text]) … ;
utilizing the … vector to produce an output result (see, e.g., paragraphs 77-78 and 94-95, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document”, “presenting the feature vector for each document to the map to see where it lies yields an x, y map position for each document. These x, y positions … can be used to visualize the relationship between documents”, “FIGS. 7, 8 and 9 provide example illustrations of how information items are searched with respect to a search query and how the results of the search are displayed”, “search processor 404 may be arranged to search the information items and to generate search results, which identify information items which correspond to a search query. The mapping processor 412 may then receive data representing the results of the search identifying information items corresponding to the search query. The mapping processor then generates the x, y co-ordinates of the positions in the array corresponding to the identified information items.” [i.e., using the reduced dimension vector to produce the search results/output result]); and
providing the output result to an output device of the user computing device (see, e.g., paragraphs 14 and 85, “an information retrieval apparatus for searching a set of information items and displaying the results of the search”, “key words in the search enquiry area 250 are then compared with the information items … This generates a list of results, each of which is shown as a respective entry 280 in the list area 260. Then the display area 270 displays display points corresponding to each of the result items.”).
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing using a TF-modifying neural network to modify the frequency information in the input TF vector, … the TF-modifying neural network applying a … weighting … the TF-modifying neural network being implemented by said one or more computing devices and including at least one layer of neurons; 
using a projection neural network to project the intermediate vector into an
embedding vector … , the projection neural network applying a … weighting … , the projection neural network being implemented by said one or more computing devices and including at least one layer of neurons and
utilizing the embedding vector to produce an output result.
In the same field, analogous art Dua teaches using a TF-modifying neural network to modify the frequency information in the input TF vector (as indicated above, “a TF-modifying neural network”, under the BRI, in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 43 and 48-49, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120, which is encoded”, … the neural network will output |V| numbers, which can be interpreted as a vector G(z), that represents the output BoN … the output vector G(z) represents a probability distribution over the ngrams of the vocabulary 108. In one illustrative embodiment, this is a probability distribution based on TF-IDF values of the ngrams.” [i.e., generator neural network/G modifies frequency information in the TF vector], “the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text” [i.e., using the generator neural network/G to modify/compute the TF vector]), 
the TF-modifying neural network applying a … weighting (as indicated above, the “TF-modifying neural network”, under the BRI, in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 43, 48-49, 116 and 126, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120, which is encoded”, “the neural network will output |V| numbers, which can be interpreted as a vector G(z), that represents the output BoN … the output vector G(z) represents a probability distribution over the ngrams of the vocabulary 108. In one illustrative embodiment, this is a probability distribution based on TF-IDF values of the ngrams.” [i.e., generator neural network/G modifies frequency information in the TF vector], “the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text” [i.e., the generator neural network/G modifies/computes the TF vector], “process involves applying weights to the various scores, where the weights have been determined through training of the statistical model employed by the QA pipeline 500 and/or dynamically updated. For example, the weights for scores generated by algorithms that identify exactly matching terms and synonym may be set relatively higher”, “then the operation of the neural network in the generator is modified, e.g., weights are modified” [i.e., the neural network applies a weighting to scores]),
the TF-modifying neural network being implemented by said one or more computing devices and including at least one layer of neurons (see, e.g., paragraphs 43 and 48, “The generator G 100 and discriminator D 130 may be implemented as software and/or hardware logic of one or more computing devices” [i.e., the TF-modifying neural network/generator neural network G being implemented by one or more computing devices], “Each row [projected z; n-gram embedding] of the concatenation matrix 116 is input to a neural network (NN) 118, which in some illustrative embodiments may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer [i.e., the neural network includes multiple layers of perceptrons/neurons]);
using a projection neural network to project the intermediate vector into an embedding vector … the projection neural network applying a … weighting … , the projection neural network including at least one layer of neurons (see, e.g., paragraphs 43, 45-46, 48-50, 116 and 126, “discriminator neural network (discriminator) D 130 receives as input an encoded bag-of-ngrams 132, which may be the bag-of-ngrams G(z) 120, for example, or a bag-of-ngrams obtained from an actual (true) portion of natural language text, and outputs an output value D(G(z))”, “The projection is a fully connected neural network layer followed by a non-linearity such as the ReLU”, “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings”, “a neural network (NN) … may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer” [i.e., a projection neural network/D includes at least one layer of perceptrons/neurons], “in the discriminator D 130, an input of a BoN model G(z) 132 is received, which may be the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text.” [i.e., bag of ngrams/BoN includes an intermediate vector], “the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112.” [i.e., the projection neural network/D projects the intermediate vector into an embedding matrix/vector], “process involves applying weights to the various scores, where the weights have been determined through training of the statistical model”, “then the operation of the neural network in the generator is modified, e.g., weights are modified” [i.e., the neural network applies a weighting to scores]) and
utilizing the embedding vector to produce an output result (see, e.g., paragraphs 50 and 54-55, “After retrieving the ngram embeddings 110 and generating the matrix B 112, logic of the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112. It should be noted that after this multiplication operation, the embeddings of ngrams that are not in the input encoded BoN 132 will all have zero values. The result of this multiplication is projected by discriminator projection logic 140 to generate an output matrix 142.”, “concatenated matrix 222 comprises rows having a first portion corresponding to the corresponding rows in the matrix Z 214, a second portion corresponding to the embeddings vector 210 of the question q 204, and a third portion corresponding to the n-gram embedding matrix 218 of the vocabulary 220. The neural network 224 processes the concatenation matrix 222 in a similar manner as previously described above, to generate the output BoN”, “After retrieving the ngram embeddings 218 and generating the n-gram embedding matrix 218, logic of the discriminator 230 multiplies the encoded ngram in the BoN model G(z, q) by the ngram embeddings in the vector matrix … The result of this multiplication is projected by discriminator projection logic 234 to generate an output matrix” [i.e., using the embedding vector to generate/produce an output matrix/result]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess is not relied on for explicitly disclosing the neural network applying a diagonal weighting matrix which includes at least one negative weighting factor and the projection neural network applying a full weighting matrix.
In the same field, analogous art Keller teaches the neural network applying a diagonal weighting matrix (as indicated above, “a diagonal weighting matrix”, under the BRI, in light of the specification, is any matrix of weights or weighting factors with weights/weighting values along a diagonal) (see, e.g., paragraphs 23 and 199-201, “input data is processed by the ANN [artificial neural network] to produce output data. … the input data may include one or more of image data, textual data”, “Before training, the weights of an artificial neural network need to be initialized. … by training a neural network to approximate uniform distribution before learning from data.”, “For recurrent artificial neural networks, starting with a uniformly scaled diagonal unit matrix for the hidden-to-hidden weights may work.”, “This configuration is sufficiently uniformly distributed to allow for convergence during training” [i.e., a neural network applies a diagonal matrix of weights and is trained]) which includes at least one negative weighting factor (see, e.g., paragraph 213, “The neural unit under consideration then just adds up the -1 for sampled connections with negative weights and +1 for sampled connections with positive weights” [i.e., the weights in the weight matrix include at least one negative weight]) and
the projection neural network applying a full weighting matrix (as indicated above, “a full weighting matrix”, under the BRI, in light of the specification, is any matrix of weights or weighting factors with non-zero weights/weighting values that are not solely in diagonal positions) (see, e.g., paragraphs 151-152, 201 and 203, “There is a strong connection between neural units and projection methods. Given a half space H+ defined by a normal vector w and a perpendicular distance from the origin b, the projection of a point x onto that half space is given”, “This is illustrated in FIG. 7, which is an interpretation 700 of projections onto halfspaces as ReLU neural units.” [i.e., a projection neural network of neural units/neurons], “initializing the weight matrices with a small positive constant works nicely when we subsample the artificial neural network … Hence the weights of the selected connections are initialized to a constant, for example, the inverse of the number of connections of a neural unit in order to be normalized … Thus the corresponding weight vector in a fully connected layer would consist of randomly or quasi-randomly chosen indices with zero and non-zero values. This configuration is sufficiently uniformly distributed”, “Given a trained artificial neural network, it is simple to compress this network to use weights in { -1, 0, 1} … paths are sampled backwards proportional to the weights of each neural unit visited along the path. This corresponds to importance sampling paths in a weighted graph.” [i.e., the neural network applies full weight matrices/vectors – with non-zero values that are not solely in diagonal positions]).
Alternatively, Keller also teaches the neural network … being implemented by said one or more computing devices and including at least one layer of neurons (see, e.g., paragraphs 34, 110, 118 and 165-166, “the ANN may be created, trained, and/or implemented utilizing the parallel processing unit (PPU)” [i.e., a neural network implemented by processing unit/computing device], “A neural unit or perceptron is the most basic model of a neural network.”, “Artificial neural networks are composed of neural units that are intended to imitate biological neurons.”, “In a fully connected artificial neural network, the number of weights … is equivalent to the number of connections, where n1 is the number of neural units in layer 1.” [i.e., the neural network including at least layer 1 of neural units/perceptrons/neurons]). 
Trepess, Dua and Keller are analogous art because they are each related to systems and techniques for processing textual information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-14 and 40, Dua, paragraphs 56 and 66-67, and Keller, paragraphs 22-23 and claim 22). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Keller to provide techniques for “representing an artificial neural network utilizing individual paths each connecting an input of the ANN to an output of the ANN” where “the weights of an artificial neural network [are] initialized” by “starting with a uniformly scaled diagonal unit matrix for the hidden-to-hidden weights” (See, e.g., Keller, Abstract and paragraphs 199-200). Doing so would have allowed Trepess in view of Dua to use Keller’s ANN techniques so that “complexity of the ANN may be reduced, and the ANN may be trained and implemented in a much faster manner when compared to an implementation using fully connected ANN graphs” by “training a neural network to approximate uniform distribution before learning from data” in order to avoid “unpredictable output during the initial training phase”, as suggested by Keller (See, e.g., Keller, Abstract and paragraphs 199-200). 

Regarding claim 19, as discussed above, Trepess in view of Dua and Keller teaches the method of claim 18
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing finding one or more candidate items, each of which has a candidate embedding vector having a prescribed relation to the embedding vector associated with the input text in a vector space, 
wherein the output result conveys information regarding said one or more candidate items to the user.
In the same field, analogous art Dua teaches finding one or more candidate items, each of which has a candidate embedding vector having a prescribed relation to the embedding vector associated with the input text in a vector space(see, e.g., paragraphs 46 and 56-57, “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings”, “the vector representation 210 of the word embedding of the input question 204. The vector r 240 is given as input to a neural network 242” [i.e., embedding vector associated with input text/question in a vector space], “The BoN model G(z, q) 232 is used to select a candidate answer to the input question q 204 generated by a QA system. … the BoN model 232 for the answer to the input question q 204 may be used to compare the ngrams in the BoN model 232 to the ngrams in a candidate answer.” [i.e., finding/selecting a candidate answer/item that has a candidate embedding vector related to/with a prescribed relation to the embedding vector associated with the input text/question q in a vector space]),
wherein the output result conveys information regarding said one or more candidate items to the user (see, e.g., paragraphs 83-84 and 88, “Content users input questions to cognitive system which implements the QA pipeline. The QA pipeline then answers the input questions using the content in the corpus of data by evaluating documents, sections of documents, portions of data in the corpus, or the like. … a response is provided containing one or more answers to the question.”, “QA pipeline receives an input question, parses the question to extract the major features of the question, uses the extracted features to formulate queries, and then applies those queries to the corpus of data. Based on the application of the queries to the corpus of data, the QA pipeline generates a set of … candidate answers to the input question”, “the cognitive system 300 provides a response to users in a ranked list of candidate answers/responses” [i.e., the output result/response conveys information regarding the candidate answers to the user]).
The motivation to combine Trepess and Dua is the same as discussed above with respect to claim 18.

Claim 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Trepess in view of Dua as applied to claim 1 above, and further in view of Jha et al. (U.S. Patent No. 10,956,522 B1, hereinafter “Jha”) and Schroeder et al. (U.S. Patent Application Pub. No. 2017/0161919 A1, hereinafter “Schroeder”).
Regarding claim 9, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach wherein said receiving comprises:
receiving an image of a product taken by the user using a digital camera; and
using optical character recognition to convert the image into the input text.
In the same field, analogous art Jha teaches wherein said receiving comprises:
receiving an image of a product taken by the user (see, e.g., col. 4, lines 39-40, col. 6, lines 34-38 and col. 8, lines 4-7, “A content item can take the form of … a photograph”, “users that have taken a particular action, such as … purchased or reviewed a product or service using an online marketplace” [i.e., the photograph/image can be an image of a purchased or reviewed product], “A user may send a request to the web server 240 to upload information (e.g., images or videos)” [i.e., receiving an image taken by a user]); and
using optical character recognition to convert the image into the input text (see, e.g., col. 8, lines 26-35, “The textual content of the content item 170 can take the form of text strings included in the content item 170 or can be in other forms such as text in an image … when the content item 170 includes images, audios, and/or videos, the text extraction module 310 may utilize an image-to-text algorithm such as optical character recognition (OCR)” [i.e., convert the image to input text using optical character recognition/OCR]).
Trepess, Dua and Jha are analogous art because they are each related to systems and techniques for processing textual and image information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, and Jha, col. 12, lines 8-21). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Jha to provide a “content review module 160 includes a text extraction module 310 that is configured to extract textual content of a content item 170” where “The content item 170 may be received … from an upload by a user. The textual content of the content item 170 can take the form of text strings included in the content item 170 or can be in other forms such as text in an image” (See, e.g., Jha, col. 8, lines 21-28). Doing so would have allowed Trepess in view of Dua to use Jha’s “content review module” with the “text extraction module” to “utilize an image-to-text algorithm such as optical character recognition (OCR) and/or a speech recognition algorithm to extract the additional textual content present in the content item” where “the extracted the words of the textual content are mapped into vectors using different embedding techniques such as term frequency-inverse document frequency (TF-IDF) vectorization, continuous big-of-words (CBOW) model, and/or skip-gram model. The mapping process may be conducted through a supervised or unsupervised neural network. The generation of the word vectors are based on aggregated word-to-word co-occurrence statistics from a corpus” so that “TF-IDF vectorization may be used to reduce the weight of common words that do not carry much semantic significance. The averaged vector represents an overall semantic characteristic of the textual content”, as suggested by Jha (See, e.g., Jha, col. 8, lines 31-34 and col. 12, lines 8-17 and 39-43). 
Although Trepess in view of Dua and Jha substantially teaches the claimed invention, Trepess in view of Dua and Jha is not relied on to teach an image … taken by the user using a digital camera.
In the same field, analogous art Schroeder teaches an image … taken by the user using a digital camera (see, e.g., paragraphs 29-30, “capturing one or more images using one or more cameras”, “The pose data may encode a translation and a rotation of a camera corresponding to the closest match known image” [i.e., an image taken by a digital camera]).
Trepess, Dua, Jha and Schroeder are analogous art because they are each related to systems and techniques for processing textual and image information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, Jha, col. 12, lines 8-21, and Schroeder, paragraphs 36-38). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua and Jha to incorporate the teachings of Schroeder to provide a system and “a method for training a network 300” where “The network 300 may be a convolutional neural network 300” and “The network training system uses a query image 312 [i.e., a query item], a positive (matching) image 310 [i.e., a positive training example/item], and a negative (non-matching) image 314 [i.e., a negative training example/item] for … training” (See, e.g., Schroeder, paragraph 36). Doing so would have allowed Trepess in view of Dua and Jha to use Schroeder’s training techniques so that “When training is complete, the network 300 maps different views of the same image close together and different images far apart” so that “This network can then be used to encode images into a nearest neighbor space” where the resulting trained “convolutional neural network outperforms current relocalization methods in both accuracy and efficiency”, as suggested by Schroeder (See, e.g., Schroeder, paragraphs 37 and 49). 

Regarding claim 10, as discussed above, Trepess in view of Dua, Jha and Schroeder teaches the device of claim 9.
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach wherein said utilizing comprises classifying the image based at least on the embedding vector.
In the same field, analogous art Jha teaches wherein said utilizing comprises classifying the image based at least on the embedding vector (see, e.g., col. 8, lines 29-34, col. 9, lines 21-28 and col. 12, line 62-col. 13, line 6, “when the content item 170 includes images … the text extraction module 310 may utilize an image-to-text algorithm such as optical character recognition (OCR) … to extract the additional textual content present in the content item”, “An embedding is created by the text used to generate the embedding. Each embedding corresponds to a noncompliant content item … Such embedding can be a vector that … represent[s] the syntactic and/or semantic relationships among other textual items that are also converted to embeddings” [i.e., the embedding vector], “layers of the neural network perform recognition of syntactic and/or semantic features by convolution, clustering, classification … The neural network is configured to receive the textual content of the noncompliant content item … The network is configured to output a vector that represents the semantic characteristic of the textual content of a noncompliant item after the input is analyzed … The output vector represents the semantic characteristic of the textual content and is served as the embedding of the noncompliant content item” [i.e., classifying the image as noncompliant based at least on the embedding vector]).
The motivation to combine Trepess, Dua and Jha is the same as discussed above with respect to claim 9.

Regarding claim 11, as discussed above, Trepess in view of Dua, Jha and Schroeder teaches the device of claim 9.
Although Trepess in view of Dua and Jha substantially teaches the claimed invention, Trepess in view of Dua and Jha is not relied on to teach using a machine-trained image-encoding component, implemented by the hardware logic circuitry, to convert the image into an image-based embedding vector, and
wherein said classifying uses a machine-trained model to classify the image based on the embedding vector associated with the input text and the image-based embedding vector.
In the same field, analogous art Schroeder teaches using a machine-trained image-encoding component, implemented by the hardware logic circuitry, to convert the image into an image-based embedding vector (see, e.g., paragraphs 33, 35, 37 and 42, “Using 128 dimensional vectors results in a lean, yet robust embedding for image based localization/relocalization methods. Such vectors can be used with convolutional neural networks”, “compact representation of an image (i.e., an embedding) may be used … A network of known N dimensional vectors corresponding to known training images, trained with … datasets (described below), may be configured to learn visual similarity (positive images) and dissimilarity (negative images). Based upon this learning process, the embedding is able to successfully encode a large degree of appearance change for a specific location or area in a relatively small data structure”, “When training is complete, the network … can then be used to encode images into a nearest neighbor space” [i.e., using a machine-trained image encoding component to convert the image into an image-based embedded vector], “neural network for use with localization/relocalization systems … allows the system to take advantage of emerging hardware acceleration” [i.e., implemented by hardware logic circuitry]), and
wherein said classifying uses a machine-trained model to classify the image based on the embedding vector associated with the input text and the image-based embedding vector (see, e.g., paragraphs 29, 32, 34, 36 and 40, “system compares a captured image with a plurality of known images to identify a known image that is the closest match to the captured image ... system accesses metadata for the closest match known image” [i.e., input text/metadata and image data], “the query image (e.g., query image 110) and the plurality of known images (e.g., known images 112a-112f) are transformed … each image is ‘embedded’ by projecting the image into a lower dimension”, “the embedding of an image 210 … the image 210 is an N dimensional vector 214 (e.g., 128 dimensional) representing the image”, “network training system uses a query image 312, a positive (matching) image 310, and a negative (non-matching) image 314 for one cycle of training. The query and positive images 312, 310 in FIG. 3 each depict the same object (i.e., a person) … The query and negative images 312, 314 in FIG. 3 depict different objects (i.e., people). The same network 310 learns all of the images 310, 312, 314, but is trained to make the scores 320, 322 of the two matching images 310, 312 as close as possible” [i.e., using a machine-trained neural network model for matching/classifying/identifying the image as being a certain object/person based on the text and image-based embedding vectors]).
The motivation to combine Trepess, Dua, Jha and Schroeder is the same as discussed above with respect to claim 9.

Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Trepess in view of Dua as applied to claim 1 above, and further in view of Schroeder.
Regarding claim 6, as discussed above, Trepess in view of Dua teaches the device of claim 5.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein said finding uses a … search technique to identify said one or more candidate items.
In the same field, analogous art Dua teaches wherein said finding uses a … search technique to identify said one or more candidate items (see, e.g., paragraphs 66 and 88, “The logic of the cognitive system implements the cognitive operation(s), examples of which include, but are not limited to, question answering, identification of related concepts within different portions of content in a corpus, intelligent search algorithms, such as Internet web page searches”, “Cognitive system users access the cognitive system 300 … and input questions/requests to the cognitive system 300 that are answered/processed based on the content in the corpus or corpora of data … the cognitive system 300 provides a response to users in a ranked list of candidate answers/responses” [i.e., system uses a search algorithm/technique to find the candidate answers/responses/items]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach wherein said finding uses a nearest neighbor search technique to identify said one or more candidate items.
In the same field, analogous art Schroeder teaches wherein said finding uses a nearest neighbor search technique to identify said one or more candidate items (see, e.g., paragraph 46, “The result of the comparison is identification of the nearest neighbor (i.e., best match) to the query data structure 414 corresponding to the query … The nearest neighbor is the known data structure (e.g., the known data structure 418b) having the shortest relative Euclidean distances to the query data structure 414.” [i.e., finding a query result uses a nearest neighbor technique to identify the best match/candidate item]). 
Trepess, Dua and Schroeder are analogous art because they are each related to systems and techniques for processing query and question information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, and Schroeder, paragraphs 36-38). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Schroeder to provide a system and “a method for training a network 300” where “The network 300 may be a convolutional neural network 300” and “The network training system uses a query image 312 [i.e., a query item], a positive (matching) image 310 [i.e., a positive training example/item], and a negative (non-matching) image 314 [i.e., a negative training example/item] for … training” (See, e.g., Schroeder, paragraph 36). Doing so would have allowed Trepess in view of Dua to use Schroeder’s training techniques so that “When training is complete, the network 300 maps different views of the same image close together and different images far apart” so that “This network can then be used to encode images into a nearest neighbor space” where the resulting trained “convolutional neural network outperforms current relocalization methods in both accuracy and efficiency”, as suggested by Schroeder (See, e.g., Schroeder, paragraphs 37 and 49). 

Regarding claim 13, as discussed above, Trepess in view of Dua teaches the device of claim 1.
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing wherein the TF-modifying neural network and the projection neural network operate based on a machine-trained model produced by a training environment, the training environment producing the machine-trained model.
In the same field, analogous art Dua teaches wherein the TF-modifying neural network and the projection neural network operate based on a machine-trained model produced by a training environment, the training environment producing the machine-trained model (see, e.g., paragraphs 25, 43, 50 and 52-53, “The GANs based mechanism for generating a bag-of-ngrams (BoN) model for use in performing natural language processing (NLP) operations … can be trained with stochastic gradient descent to quickly arrive at a trained generator”, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120 … The discriminator neural network (discriminator) D 130 receives as input an encoded bag-of-ngrams 132 … The generator G 100 and discriminator D 130 may be implemented as software and/or hardware logic of one or more computing devices, where the logic may be trained, for example, using a stochastic gradient descent algorithm, or other suitable training process”, “The projection can be seen as a fully connected neural network layer … where W is a matrix of parameters that is learned during training and each row of X is a projection of a row in the input matrix.”, “The proposed GAN based architecture … can … generate bag-of ngrams conditioned in other texts, e.g., other questions, or classes, e.g., sentiment classes positive/negative. … to generate a BoN that has positive sentiment. For such a purpose, the conditional GAN may be trained”, “the GANs based architecture may be used to implement a bag-of-ngrams (BoN) model … the GAN mechanism 200 can be trained to generate the bag-of-ngrams … which may be generated using a recurrent neural network (RNN) 208 to the matrix Z 214 generated by projection and replication logic 212 and the word embeddings matrix” [i.e., the TF-modifying neural network/generator neural network G and the projection neural network D operate on a machine-trained BoN model and the trained logic produced by a training environment – the GAN architecture including the one or more computing devices]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach producing the machine-trained model by:
collecting a plurality of training examples, the training examples including query items, positive items, and negative items, wherein each positive item has a positive relationship with an identified query item, and each negative item has a negative relationship with an identified query item; and
producing the machine-trained model by iteratively decreasing distances between embedding vectors associated with query items and their associated positive items, and iteratively increasing distances between embedding vectors associated with query items and their associated negative items.
In the same field, analogous art Schroeder teaches producing the machine-trained model by:
collecting a plurality of training examples, the training examples including query items, positive items, and negative items, wherein each positive item has a positive relationship with an identified query item, and each negative item has a negative relationship with an identified query item (see, e.g., paragraph 36, “The network training system uses a query image 312, a positive (matching) image 310, and a negative (non-matching) image 314 for one cycle of training. The query and positive images 312, 310 in FIG. 3 each depict the same object (i.e., a person), perhaps from different points of view. The query and negative images 312, 314 in FIG. 3 depict different objects (i.e., people).” [i.e., training by collecting positive and negative training images/examples with positive/matching and negative relationships with an identified query item/object]); and
producing the machine-trained model by iteratively decreasing distances between embedding vectors associated with query items and their associated positive items, and iteratively increasing distances between embedding vectors associated with query items and their associated negative items (see, e.g., paragraphs 35-36 and 38-39, “an embedding) may be used to compare the similarity of one location to another by comparing the Euclidean distance between the N dimensional vectors. A network of known N dimensional vectors corresponding to known training images, trained … with … datasets … learn visual similarity (positive images) and dissimilarity (negative images). Based upon this learning process, the embedding is able to successfully encode … in a relatively small data structure” [i.e., an embedding vector associated with positive and negative query items], “The same network 310 learns all of the images 310, 312, 314, but is trained to make the scores 320, 322 of the two matching images 310, 312 as close as possible and the score 324 of the non-matching image 314 as different as possible from the scores 320, 322 of the two matching images 310, 312. This training process is repeated with a large set of images.”, “Learning the weights of the neural network (i.e., the training algorithm) includes comparing a triplet of … a query image, positive image, and negative image. A first Euclidean distance between respective first and second pose data corresponding to the query and positive images is less than a predefined threshold, and a second Euclidean distance between respective first and third pose data corresponding to the query and negative images is more than the predefined threshold … The network produces a 128 dimensional vector for each image in the triplet, and an error term is non-zero if the negative image is closer (in terms of Euclidean distance) to the query image than the positive. … The network can be trained by decreasing a first Euclidean distance between first and second 128 dimensional vectors corresponding to the query and positive images in an N dimensional space, and increasing a second Euclidean distance between first and third 128 dimensional vectors respectively corresponding to the query and negative images in the N dimensional space. The final configuration of the network is achieved after passing a large number of triplets through the network.”, “the triplet convolutional neural network model embeds an image into a lower dimensional space where the system can measure meaningful distances between images. Through the careful selection of triplets, consisting of three images that form an anchor-positive pair of similar images and an anchor-negative pair of dissimilar images, the convolutional neural network can be trained” [i.e., producing trained neural network model by repeatedly/iteratively decreasing distances for embedding vector items and matching/positive items, and increasing distances for embedding vector items and non-matching/negative items]).
Trepess, Dua and Schroeder are analogous art because they are each related to systems and techniques for processing query and question information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, and Schroeder, paragraphs 36-38). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Schroeder to provide a system and “a method for training a network 300” where “The network 300 may be a convolutional neural network 300” and “The network training system uses a query image 312 [i.e., a query item], a positive (matching) image 310 [i.e., a positive training example/item], and a negative (non-matching) image 314 [i.e., a negative training example/item] for … training” (See, e.g., Schroeder, paragraph 36). Doing so would have allowed Trepess in view of Dua to use Schroeder’s training techniques so that “When training is complete, the network 300 maps different views of the same image close together and different images far apart” so that “This network can then be used to encode images into a nearest neighbor space” where the resulting trained “convolutional neural network outperforms current relocalization methods in both accuracy and efficiency”, as suggested by Schroeder (See, e.g., Schroeder, paragraphs 37 and 49). 

Examiner’s Note: Claim 20 is directed to “A computer-readable storage medium for storing computer-readable instructions, the computer-readable instructions, when executed by one or more hardware processors, performing a method”. According to the original specification of the applicant, the utilization of the computer-readable storage medium is limited to non-transitory computer-readable storage media [i.e., paragraph 28 “the specific term ‘computer-readable storage medium’ expressly excludes propagated signals per se, while including all other forms of computer-readable media.”]

With respect to independent claim 20, Trepess discloses the invention as claimed including a computer-readable storage medium for storing computer-readable instructions, the computer-readable instructions, when executed by one or more hardware processors, performing a method (see, e.g., paragraphs 14 and 39 and claims 29-30, “an information retrieval apparatus for searching a set of information items … The apparatus comprises a search processor operable to search the information items … A mapping processor operable to generate data representative of a map of information items from a set of information items identified in the search” [i.e., one or more hardware processors that perform operations], “an information storage and retrieval system based around a general-purpose computer 10 having a processor unit 20 including disk storage 30 for programs and data” [i.e., the computer/computing device includes programs that are executable computer-readable instructions stored in a computer-readable storage medium/disk storage 30], “Computer software having program code for carrying out a method”, “medium for providing program code” [i.e., software/program code includes executable computer-readable instructions stored in a memory/computer-readable storage medium for performing operations/carrying out the method]”) that comprises:
receiving an instance of input text in response to an action taken by a user using a user computing device (see, e.g., paragraphs 15, 18-19 and 85, “The present invention addresses a technical problem of defining a search query for search information items and for refining a search for information items, which particularly advantageous for searching a [sic – and] navigating large amounts of data”, “features may be combined to form a search query in accordance with the Boolean operators specified the user.”, “a graphical user interface … for forming a search query … the conditions for the search are specified by Boolean operators. Accordingly a user may specify a search query in accordance with the information items elected in different rows of the interface”, “user then initiates the search, for example by pressing enter on the keyboard 70 or by using the mouse 80 to select a screen ‘button’ to start the search” [i.e., receiving search information items responsive to an action taken by a user using a computing device]);
generating an input term-frequency (TF) vector that includes frequency information relating to frequency of occurrence of terms in the input text (see, e.g., paragraphs 49 and 60, “Feature extraction is the process of transforming raw data into an abstract representation. These abstract representations can then be used for processes such as pattern classification, clustering and recognition. In this process, a so-called ‘feature vector’ is generated, which is an abstract representation of the frequency of terms used within a document”, “a term frequency histogram is generated for each document in the set … by counting the number of times words present in the dictionary (pertaining to that document set) occur within an individual document” [i.e., generating an input term frequency feature vector including frequency information relating to frequency of occurrence/number of times terms/words occur in the input text/documents]);
using a TF-modifying … to modify the frequency information in the input TF vector, associated with respective terms, by respective machine-trained weighting factors, to produce an intermediate vector (see, e.g., paragraphs 64, 67, 73-74 and 77, “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’”, “The size of the feature vector, and so the dimension of the term frequency histogram, is reduced” [i.e., using term-frequency/TF modifying to modify the frequency information in the TF vector, producing an intermediate, reduced dimension feature vector], “values in the feature vectors being used to train the map”, “initially each of these weights is set to a random value, and then, through an iterative process, the weights are ‘trained’. The map is trained by presenting each feature vector to the input nodes of the map.”, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document” [i.e., method includes training and modifies the frequency information in the TF vector based on machine-trained weights/weighting factors to produce an intermediate vector with reduced dimensions after the training phase]) … ;
using a projection … to project the intermediate vector into a … vector having a dimensionality that is less than a dimensionality of the input TF vector, the … vector providing a distributed compact representation of semantic information in the input text (see, e.g., paragraphs 11, 66-67 and 71, “The n-value vectors are then mapped onto smaller dimensional vectors (i.e. vectors having a number of values m) … which is substantially less than n … by multiplying the vector by an (nxm) ‘projection matrix’ formed of an array of random numbers … to generate vectors of smaller dimension where any two reduced-dimension vectors have much the same vector dot product as the two respective input vectors”, “Semantic Indexing-a technique whereby the dimension of the histogram is reduced by looking for groups of terms that have a high probability of occurring simultaneously in documents”, “The method selected for reducing the dimension of the term frequency histogram in the present embodiment is ‘random mapping’ … reducing the dimension of the histogram by multiplying it by a matrix of random numbers”, “Once feature vectors have been generated for the document collection, thus defining the collection's information space, they are projected into a two-dimensional SOM [self-organizing map] at a step 150 to create a semantic map. … process of mapping to 2-D by clustering the feature vectors” [i.e., the system uses a projection to project the intermediate feature vector into a vector with less dimensions/2-D and providing a semantic map/distributed compact representation of semantic information in the input text]) … ;
utilizing the … vector to produce an output result (see, e.g., paragraphs 77-78 and 94-95, “Once the map is trained, each of the documents can be presented to the map to see which of the output nodes is closest to the input feature vector for that document”, “presenting the feature vector for each document to the map to see where it lies yields an x, y map position for each document. These x, y positions … can be used to visualize the relationship between documents”, “FIGS. 7, 8 and 9 provide example illustrations of how information items are searched with respect to a search query and how the results of the search are displayed”, “search processor 404 may be arranged to search the information items and to generate search results, which identify information items which correspond to a search query. The mapping processor 412 may then receive data representing the results of the search identifying information items corresponding to the search query. The mapping processor then generates the x, y co-ordinates of the positions in the array corresponding to the identified information items.” [i.e., using the reduced dimension vector to produce the search results/output result]); and
providing the output result to an output device of the user computing device (see, e.g., paragraphs 14 and 85, “an information retrieval apparatus for searching a set of information items and displaying the results of the search”, “key words in the search enquiry area 250 are then compared with the information items … This generates a list of results, each of which is shown as a respective entry 280 in the list area 260. Then the display area 270 displays display points corresponding to each of the result items.”).
Although Trepess substantially discloses the claimed invention, Trepess is not relied on for explicitly disclosing using a TF-modifying neural network to modify the frequency information in the input TF vector, … the TF-modifying neural network including at least one layer of neurons; 
using a projection neural network to project the intermediate vector into an embedding vector … , the projection neural network including at least one layer of neurons;
utilizing the embedding vector to produce an output result,
the TF-modifying neural network and the projection neural network operating based on a machine-trained model produced by a training environment, the training environment producing the machine-trained model.
In the same field, analogous art Dua teaches using a TF-modifying neural network to modify the frequency information in the input TF vector, … the TF-modifying neural network including at least one layer of neurons (as indicated above, “a TF-modifying neural network”, under the BRI, in light of the specification, is any trained neural network usable to modify term-frequency information) (see, e.g., paragraphs 43 and 48-49, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120, which is encoded”, “Each row [projected z; n-gram embedding] of the concatenation matrix 116 is input to a neural network (NN) 118, which in some illustrative embodiments may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer [i.e., the neural network includes multiple layers of perceptrons/neurons] … the neural network will output |V| numbers, which can be interpreted as a vector G(z), that represents the output BoN … the output vector G(z) represents a probability distribution over the ngrams of the vocabulary 108. In one illustrative embodiment, this is a probability distribution based on TF-IDF values of the ngrams.” [i.e., generator neural network/G modifies frequency information in the TF vector], “the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text” [i.e., using the generator neural network/G to modify/compute the TF vector]),
using a projection neural network to project the intermediate vector into an embedding vector … , the projection neural network including at least one layer of neurons (see, e.g., paragraphs 43, 45-46 and 48-50, “discriminator neural network (discriminator) D 130 receives as input an encoded bag-of-ngrams 132, which may be the bag-of-ngrams G(z) 120, for example, or a bag-of-ngrams obtained from an actual (true) portion of natural language text, and outputs an output value D(G(z))”, “The projection is a fully connected neural network layer followed by a non-linearity such as the ReLU”, “Each of the ngrams in the vocabulary 108 may be represented as vector representations, i.e. embeddings”, “a neural network (NN) … may be a multi-layer perceptron (MLP) that uses a Rectified Linear Unit (ReLU) as the activation function of the output layer” [i.e., a projection neural network/D includes at least one layer of perceptrons/neurons], “in the discriminator D 130, an input of a BoN model G(z) 132 is received, which may be the BoN model G(z) 120 from the generator G 100 or a BoN generated by means of computing the TF-IDF vector from a real (actual) natural language text.” [i.e., bag of ngrams/BoN includes an intermediate vector], “the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112.” [i.e., the projection neural network/D projects the intermediate vector into an embedding matrix/vector]);
utilizing the embedding vector to produce an output result (see, e.g., paragraphs 50 and 54-55, “After retrieving the ngram embeddings 110 and generating the matrix B 112, logic of the discriminator 130 multiplies the input encoded ngram in the BoN 132 by the ngram embeddings matrix B 112. It should be noted that after this multiplication operation, the embeddings of ngrams that are not in the input encoded BoN 132 will all have zero values. The result of this multiplication is projected by discriminator projection logic 140 to generate an output matrix 142.”, “concatenated matrix 222 comprises rows having a first portion corresponding to the corresponding rows in the matrix Z 214, a second portion corresponding to the embeddings vector 210 of the question q 204, and a third portion corresponding to the n-gram embedding matrix 218 of the vocabulary 220. The neural network 224 processes the concatenation matrix 222 in a similar manner as previously described above, to generate the output BoN”, “After retrieving the ngram embeddings 218 and generating the n-gram embedding matrix 218, logic of the discriminator 230 multiplies the encoded ngram in the BoN model G(z, q) by the ngram embeddings in the vector matrix … The result of this multiplication is projected by discriminator projection logic 234 to generate an output matrix” [i.e., using the embedding vector to generate/produce an output matrix/result]),
the TF-modifying neural network and the projection neural network operating based on a machine-trained model produced by a training environment, the training environment producing the machine-trained model (see, e.g., paragraphs 25, 43, 50 and 52-53, “The GANs based mechanism for generating a bag-of-ngrams (BoN) model for use in performing natural language processing (NLP) operations … can be trained with stochastic gradient descent to quickly arrive at a trained generator”, “the generator neural network (generator) G 100 is trained to produce a bag-of-ngrams G(z) 120 … The discriminator neural network (discriminator) D 130 receives as input an encoded bag-of-ngrams 132 … The generator G 100 and discriminator D 130 may be implemented as software and/or hardware logic of one or more computing devices, where the logic may be trained, for example, using a stochastic gradient descent algorithm, or other suitable training process”, “The projection can be seen as a fully connected neural network layer … where W is a matrix of parameters that is learned during training and each row of X is a projection of a row in the input matrix.”, “The proposed GAN based architecture … can … generate bag-of ngrams conditioned in other texts, e.g., other questions, or classes, e.g., sentiment classes positive/negative. … to generate a BoN that has positive sentiment. For such a purpose, the conditional GAN may be trained”, “the GANs based architecture may be used to implement a bag-of-ngrams (BoN) model … the GAN mechanism 200 can be trained to generate the bag-of-ngrams … which may be generated using a recurrent neural network (RNN) 208 to the matrix Z 214 generated by projection and replication logic 212 and the word embeddings matrix” [i.e., the TF-modifying neural network/generator neural network G and the projection neural network D operate on a machine-trained BoN model and the trained logic produced by a training environment – the GAN architecture including the one or more computing devices]).
Trepess and Dua are analogous art because they are both related to systems and techniques for searching textual information (See, e.g., Trepess, Abstract and paragraph 40, and Dua, paragraphs 66-67).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess to incorporate the teachings of Dua to provide “a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation” and “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” where “one informative way to encode the ngrams is to use the Term Frequency-Inverse Document Frequency (TF-IDF) of the ngrams as the values in the encoding vector” and “the values of the vector slots in the BoN vector are calculated as the TF-IDF of the corresponding ngram” (See, e.g., Dua, Abstract and paragraphs 41-42). Doing so would have allowed Trepess to use Dua’s “Question Answering (QA) system” with “a GAN based mechanism for generating a bag-of-ngrams (BoN) model” in order to generate “a BoN model [that] is sufficient in many NLP tasks to obtain accurate and reliable results and thus, a mechanism for generating a BoN model from noise input without having to process a large amount of actual natural language content is a beneficial tool that greatly reduces the time and resources needed to generate an accurate model” and to use “the framework of the GAN based mechanisms of the illustrative embodiments [which] can be easily adapted to perform different NLP tasks,” as suggested by Dua (See, e.g., Dua, paragraphs 24, 27 and 42). 
Although Trepess in view of Dua substantially teaches the claimed invention, Trepess is not relied on for explicitly disclosing the training environment producing the machine-trained model by:
collecting a plurality of training examples, the training examples including query items, positive items, and negative items, wherein each positive item has a positive relationship with an identified query item, and each negative item has a negative relationship with an identified query item; and
producing the machine-trained model by iteratively decreasing distances between embedding vectors associated with query items and their associated positive items, and iteratively increasing distances between embedding vectors associated with query items and their associated negative items.
In the same field, analogous art Schroeder teaches producing the machine-trained model by:
collecting a plurality of training examples, the training examples including query items, positive items, and negative items, wherein each positive item has a positive relationship with an identified query item, and each negative item has a negative relationship with an identified query item (see, e.g., paragraph 36, “The network training system uses a query image 312, a positive (matching) image 310, and a negative (non-matching) image 314 for one cycle of training. The query and positive images 312, 310 in FIG. 3 each depict the same object (i.e., a person), perhaps from different points of view. The query and negative images 312, 314 in FIG. 3 depict different objects (i.e., people).” [i.e., training by collecting positive and negative training images/examples with positive/matching and negative relationships with an identified query item/object]); and
producing the machine-trained model by iteratively decreasing distances between embedding vectors associated with query items and their associated positive items, and iteratively increasing distances between embedding vectors associated with query items and their associated negative items (see, e.g., paragraphs 35-36 and 38-39, “an embedding) may be used to compare the similarity of one location to another by comparing the Euclidean distance between the N dimensional vectors. A network of known N dimensional vectors corresponding to known training images, trained … with … datasets … learn visual similarity (positive images) and dissimilarity (negative images). Based upon this learning process, the embedding is able to successfully encode … in a relatively small data structure” [i.e., an embedding vector associated with positive and negative query items], “The same network 310 learns all of the images 310, 312, 314, but is trained to make the scores 320, 322 of the two matching images 310, 312 as close as possible and the score 324 of the non-matching image 314 as different as possible from the scores 320, 322 of the two matching images 310, 312. This training process is repeated with a large set of images.”, “Learning the weights of the neural network (i.e., the training algorithm) includes comparing a triplet of … a query image, positive image, and negative image. A first Euclidean distance between respective first and second pose data corresponding to the query and positive images is less than a predefined threshold, and a second Euclidean distance between respective first and third pose data corresponding to the query and negative images is more than the predefined threshold … The network produces a 128 dimensional vector for each image in the triplet, and an error term is non-zero if the negative image is closer (in terms of Euclidean distance) to the query image than the positive. … The network can be trained by decreasing a first Euclidean distance between first and second 128 dimensional vectors corresponding to the query and positive images in an N dimensional space, and increasing a second Euclidean distance between first and third 128 dimensional vectors respectively corresponding to the query and negative images in the N dimensional space. The final configuration of the network is achieved after passing a large number of triplets through the network.”, “the triplet convolutional neural network model embeds an image into a lower dimensional space where the system can measure meaningful distances between images. Through the careful selection of triplets, consisting of three images that form an anchor-positive pair of similar images and an anchor-negative pair of dissimilar images, the convolutional neural network can be trained” [i.e., producing trained neural network model by repeatedly/iteratively decreasing distances for embedding vector items and matching/positive items, and increasing distances for embedding vector items and non-matching/negative items]).
Trepess, Dua and Schroeder are analogous art because they are each related to systems and techniques for processing query and question information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, and Schroeder, paragraphs 36-38). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua to incorporate the teachings of Schroeder to provide a system and “a method for training a network 300” where “The network 300 may be a convolutional neural network 300” and “The network training system uses a query image 312 [i.e., a query item], a positive (matching) image 310 [i.e., a positive training example/item], and a negative (non-matching) image 314 [i.e., a negative training example/item] for … training” (See, e.g., Schroeder, paragraph 36). Doing so would have allowed Trepess in view of Dua to use Schroeder’s training techniques so that “When training is complete, the network 300 maps different views of the same image close together and different images far apart” so that “This network can then be used to encode images into a nearest neighbor space” where the resulting trained “convolutional neural network outperforms current … methods in both accuracy and efficiency”, as suggested by Schroeder (See, e.g., Schroeder, paragraphs 37 and 49). 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Trepess in view of Dua and Schroeder as applied to claim 13 above, and further in view of Jha.
Regarding claim 14, as discussed above, Trepess in view of Dua and Schroeder teaches the device of claim 13.
Although paragraph 84 of Dua discloses that “QA pipeline accesses a body of knowledge about the domain … where the body of knowledge (knowledgebase) can be organized in a variety of configurations, e.g., a structured repository of domain-specific information, such as ontologies, or unstructured data related to the domain, or a collection of natural language documents about the domain” and Trepess in view of Dua substantially teaches the claimed invention, Trepess in view of Dua is not relied on to teach wherein said collecting comprises collecting the query items, positive items, and negative items from a … knowledgebase, the … knowledgebase providing nodes associated with entities and links associated with relationships among the entities.
In the same field, analogous art Schroeder teaches wherein said collecting comprises collecting the query items, positive items, and negative items from a … knowledgebase, the … knowledgebase providing nodes associated with entities and links associated with relationships among the entities (see, e.g., FIG. 4 depicting database 416 of data structures 418a-e with metadata [i.e., a knowledgebase with query, positive and negative items] and paragraphs 15, 36 and 46, “accessing a database of the known images annotated with the respective metadata”, “system uses a query image 312, a positive (matching) image 310, and a negative (non-matching) image 314 …The query and positive images 312, 310 in FIG. 3 each depict the same object (i.e., a person), perhaps from different points of view. The query and negative images 312, 314 in FIG. 3 depict different objects (i.e., people).” [i.e., collecting query items/images 312, positive and negative items 310, 314], “The query data structure 414 corresponding to the query image 410 is compared to a database 416 of known data structures 418a-418e. Each known data structures 418a-418e is associated in the database 416 with corresponding metadata 420a-420e, which includes pose data for the system which captured the known image corresponding to the known data structure 418.” [i.e., collect query items in query data structure 414 and query image 410 and positive and negative items from a database/knowledgebase that provides data structures/nodes associated with entities/known images and links/relationships/correspondences among the entities/known images in known data structure 418]).
The motivation to combine Trepess, Dua and Schroeder is the same as discussed above with respect to claim 13.
Although Trepess in view of Dua and Schroeder substantially teaches the claimed invention, Trepess in view of Dua and Schroeder is not relied on to teach collecting … items from a relational knowledgebase, the relational knowledgebase providing nodes associated with entities.
In the same field, analogous art Jha teaches collecting … items from a relational knowledgebase, the relational knowledgebase providing nodes associated with entities (see, e.g., col. 8, lines 65-67 and col. 10, lines 10-12, “content store 320 may classify the known noncompliant content items by policies and store the noncompliant content in a relational database”, “the regular store 350 classifies the regular expressions by policies and stores the regular expressions in a relational database” [i.e., collecting items from a relational database/knowledgebase that provides nodes/database records associated with entities/classified expressions]).
Trepess, Dua, Schroeder and Jha are analogous art because they are each related to systems and techniques for processing textual and image information using neural networks (See, e.g., Trepess, Abstract and paragraphs 9-15, 18-19 and 40, Dua, paragraphs 56, 66-67 and 83, Schroeder, paragraphs 36-38, and Jha, col. 12, lines 8-21). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Trepess in view of Dua and Schroeder to incorporate the teachings of Jha to provide a “content review module 160 includes a text extraction module 310 that is configured to extract textual content of a content item 170” where “The content item 170 may be received … from an upload by a user. The textual content of the content item 170 can take the form of text strings included in the content item 170 or can be in other forms such as text in an image” (See, e.g., Jha, col. 8, lines 21-28). Doing so would have allowed Trepess in view of Dua and Schroeder to use Jha’s “content review module” with the “text extraction module” to “utilize an image-to-text algorithm such as optical character recognition (OCR) and/or a speech recognition algorithm to extract the additional textual content present in the content item” where “the extracted the words of the textual content are mapped into vectors using different embedding techniques such as term frequency-inverse document frequency (TF-IDF) vectorization, continuous big-of-words (CBOW) model, and/or skip-gram model. The mapping process may be conducted through a supervised or unsupervised neural network. The generation of the word vectors are based on aggregated word-to-word co-occurrence statistics from a corpus” so that “TF-IDF vectorization may be used to reduce the weight of common words that do not carry much semantic significance. The averaged vector represents an overall semantic characteristic of the textual content”, as suggested by Jha (See, e.g., Jha, col. 8, lines 31-34 and col. 12, lines 8-17 and 39-43). 

Allowable Subject Matter
Upon overcoming of all the rejections as discussed above in items 8-16, claims 15-17 are objected to as being dependent upon a rejected base claim (i.e., claim 1), but would be allowable if amended to address the above-noted rejections under 35 U.S.C. 103 and rewritten in independent form including all of the limitations of the base claim and any intervening claims (i.e., intervening claim 13 in the case of claim 15, and intervening claim 15 in the case of claims 16 and 17).
For example, with regard to dependent claims 15-17, the prior art of record does not anticipate, nor do they render obvious in any reasonable combination to one of ordinary skill in the art at the time of Applicants' invention, the combination of recited limitations of claims 15 and 16-17, their respective base claims, independent claim 1, and their respective intervening claims, claims 13 and 15 (claim 15 depends from intervening claim 13, and claims 16-17 both depend from intervening claim 15). 
As discussed above, Trepess in view of Dua and Schroeder teaches the device of claim 13.
With regard to claim 15, the prior art of record does not anticipate or render obvious the limitations “wherein the operations further include identifying a subset of hard negative items that meet a prescribed test of relatedness to respective query items, but are nonetheless not considered matches for those respective query items,
wherein said producing uses the hard negative items to generate the machine-trained model.”
Claims 16 and 17 are objected to as being dependent upon a rejected base claim (i.e., claim 1), but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims (i.e., intervening claims 13 and 15). For example, the prior art of record does not anticipate or render obvious the limitations recited in dependent claims 16 and 17, in combination with limitations of their base claim, independent claim 1, and intervening claims 13 and 15. 

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon, is considered pertinent to applicant's disclosure.
The examiner requests, in response to this office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the reference cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111 (c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RANDY K BALDWIN whose telephone number is (571)270-5222. The examiner can normally be reached on Mon - Fri 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/R.K.B./Examiner, Art Unit 2125


/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125