DETAILED ACTION

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 08/11/2020 and 09/29/2020. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Henderson et al. (US 10664527 B1), hereinafter referenced as Henderson.

Regarding claims 1, 8, and 15 Henderson teaches a computer-implemented method, executed on a computing device (Fig.1 and Fig.3; Col.6 Lines 22-29 Henderson discloses, the computing system 500 may be an end-user system that receives inputs from a user (e.g. via a keyboard) and determines similarity values (e.g. for determining a response to a query). Alternatively, the system may be a server that receives input over a network and determines the similarity values. Either way, these similarity values may be used to determine appropriate responses to user queries, as discussed with regard to FIG. 3.); a computer program product residing on a non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations (Fig.1; Col.4 Lines 59-67 Henderson discloses, FIG. 1 is a schematic of a system in accordance with an embodiment of the invention. System comprises a mobile telephone 551 as a user interface. However, the user could potentially contact the system using any device such as a laptop, tablet computer, smart watch et cetera. In the embodiment described herein, the user will provide commands via voice. However, the commands could also be inputted by a text interface. It should be noted that the system can be configured to work with both text and audio and further Col.6 Lines 22-31 Henderson discloses, the computing system 500 may be an end-user system that receives inputs from a user (e.g. via a keyboard) and determines similarity values (e.g. for determining a response to a query). Alternatively, the system may be a server that receives input over a network and determines the similarity values. Either way, these similarity values may be used to determine appropriate responses to user queries, as discussed with regard to FIG. 3. For instance, the mass storage unit may store the response vectors and associated responses.); and a computing system (Fig.1 #500 called computing system) comprising: a memory (Fig.1 #505 called working memory); and a processor (Fig.1 #501 called processor) configured to receive at least one conversational phrase (Fig.3 #S101; Col.6 Lines 22-29 Henderson discloses, the computing system 500 may be an end-user system that receives inputs from a user ... the system may be a server that receives input over a network and determines the similarity values).
wherein the processor is further configured to determine a first probability for a subset of candidate responses of a plurality of candidate responses based upon, at least in part, context associated with the at least one conversational phrase, the at least one conversational phrase, and each context associated with the plurality of candidate responses, (Fig. 4a #S203, #S205, #S207, #S209; Col.2 Lines 59-67 Henderson discloses, in the above embodiment, the retrieval of suitable/relevant responses occurs in the conversational context [determination of the candidate responses is based on user input of conversational context]. pre-computing response vectors prior to inference time, which leads to very quick similarity computations and response ranking [called first probability]; the pre-computation of response vectors enables encoding new system responses without retraining the entire system. and further Col.12 Line 62 to Col.13 Line 8 Henderson discloses, at inference time, finding relevant candidates given a context reduces to computing he for the context c, and finding nearby hr vectors [searching for the subset of candidate responses based on conversational context phrase provided to the system]. The hr vectors can all be pre- computed, and the nearest neighbour search can be optimized, giving an efficient search that can scale to billions of candidates. The fact that hr vectors can all be pre-computed enables the use of external optimisation libraries for nearest neighbours search that enable efficient search of a large pool of candidate responses. The optimisation of nearest neighbour search is a well-known problem in computer science and the above embodiment enables a direct implementation of readily available solutions [as shown here that providing a user query and then weighing it based on the different query vectors from the pool of large datasets to provide the best match is well known]. furthermore Col.6 Lines 61-64 Henderson discloses, in step S207, given the user's input, all the responses in the index pertaining to restaurants in S are retrieved and the top N responses, r1, r2, ..., rN are taken with corresponding cosine similarity scores: s1 ≥ s2 ≥ ... ≥ sN). 
wherein the processor is further configured to determine a second probability for the subset of candidate responses based upon, at least in part, the subset of candidate responses, (Col.6 Line 51 to Col.7 Line 5 Henderson discloses, a parameter S is initialised as the set of restaurants in the city in step S203, the user inputs "a restaurant with a nice view of the castle". As before, in step S205, the input phrase is encoded as described above. In step S206, the encoded context vector produced in step S205 is put through an intent classifier that will be described below. If the intent classifier is negative then the process transfers to step S207. In step S207, given the user's input, all the responses in the index pertaining to restaurants in S are retrieved and the top N responses, r1, r2, ..., rN are taken with corresponding cosine similarity scores: s1 ≥ s2 ... ≥ sN Many methods can be used for determining a nearest neighbour search for determining the top N responses. In an embodiment, an approximate nearest neighbour search is performed where the responses are clustered and the similarity of the encoded context vector to the clusters is calculated. Such a search can be considered to be a form of greedy routing in k-Nearest Neighbor (k-NN) graphs). the at least one conversational phrase, and the context associated with the at least one conversational phrase, (Col.7 Lines 44-67 Henderson discloses, update S [Updates to the parameter S which was initiated earlier is second probability used to provide most relevant responses] to the smallest set of restaurants with highest q whose q values sum up to more than a threshold t Next, in step S211 the most relevant responses for S are collated and the top 2 are selected. If there are multiple relevant restaurants, one response is shown from each. When only one restaurant is relevant, the top N responses are all shown, and relevant photos are also displayed. A simple set of rules is used to provide a spoken response for the system (e.g. "One review of X said'_"'). The rules employ templates to allow a natural answer to be provided. For example, if the user inputs a query that is likely to return responses relating to multiple restaurants, for example "Where is good for Indian food?"-a response will be provided with templates such as "Check out these places.", "I found ... ". However, when the responses relate to just one restaurant, the system might respond "According to ... ", "Check out these results .... " As noted above, the number of restaurants is reduced as the dialogue progresses. When the user asks a first question, N top responses are identified and these correspond to S restaurants. When the user asks a follow-up question, the context vector of the new query is generated and this is compared with the response vectors for the already identified S restaurants.).
and wherein the processor is further configured to determine at least one candidate response for the at least one conversational phrase based upon, at least in part, the first probability for the subset of candidate responses and the second probability for the subset of candidate responses (Fig.4a; Col.9 Lines 58-63 Henderson discloses, a single query vector will be produced here, but it will be a combination of the first and second query vectors. In an embodiment, this is realised by adding the vectors produced from both queries and weighting the latest question more heavily than the first question.).

Regarding claims 2, 9, and 16 Henderson teaches the subset of candidate responses of the plurality of candidate responses includes a predefined number of nearest neighbor context-response pairs of a plurality of context-responses pairs observed in system training (Col.6 Line 61 to Col.7 Line 5 Henderson discloses, in step S207, given the user's input, all the responses in the index pertaining to restaurants in S are retrieved and the top N responses, r1, r2, ..., rN are taken with corresponding cosine similarity scores: s1 ≥ s2 ... ≥ sN Many methods can be used for determining a nearest neighbour search for determining the top N responses. In an embodiment, an approximate nearest neighbour search is performed where the responses are clustered and the similarity of the encoded context vector to the clusters is calculated. Such a search can be considered to be a form of greedy routing in k-Nearest Neighbor (k-NN) graphs). 

Regarding claims 3, 10, and 17 Henderson teaches the first probability defines a likelihood of the subset of candidate responses as candidate responses to the at least one conversational phrase based upon, at least in part, a distance between the context of the at least one conversational phrase and the at least one conversational phrase, and the context associated with each context-response pair observed in system training (Col.5 Lines 25-55 Henderson discloses, however, the encoder for a query is trained to output a vector (a context vector). The training has been performed using queries and corresponding responses. In an embodiment, these are selected from free text entries into social networking engines, restaurant review sites et cetera. The model has been trained so that the encoder for a query produces a context vector which is very similar to a response vector for a suitable response. The similarity between the two vectors can be determined using a similarity measure such as the cosine similarity. In one embodiment, the response vector does not only describe a phrase, but may also or additionally relate to a figure. The possible response vectors have been generated offline and reside in a database. Therefore, once the phrase is encoded using the trained encoder in the context vector is produced; the next stage is to look for similar vectors to this context vector in the response vector database. In one embodiment, similarity is measured using cosine similarity. There are many possible ways in which the search space can be optimised to look for similar vectors.  Once the top in response vectors have been identified, these responses are output in a visual form as shown in FIG. 2. If images are also present, these may be output as well. For example, in one embodiment, a single image is output with maybe the 3 response vectors with the largest similarity scores.).

Regarding claims 4, 11, and 18 Henderson teaches the second probability defines a likelihood of the subset of candidate responses as candidate responses to the at least one conversational phrase based upon, at least in part, a distance between the context of the conversational phrase and the at least one conversational phrase, and the response associated with each context-response pair of the subset of candidate responses (Col.7 Lines 44-67 Henderson discloses, update S [Updates to the parameter S which was initiated earlier is second probability used to provide most relevant responses] to the smallest set of restaurants with highest q whose q values sum up to more than a threshold t Next, in step S211 the most relevant responses for S are collated and the top 2 are selected. If there are multiple relevant restaurants, one response is shown from each. When only one restaurant is relevant, the top N responses are all shown, and relevant photos are also displayed. A simple set of rules is used to provide a spoken response for the system (e.g. "One review of X said'_"'). The rules employ templates to allow a natural answer to be provided. For example, if the user inputs a query that is likely to return responses relating to multiple restaurants, for example "Where is good for Indian food?"-a response will be provided with templates such as "Check out these places.", "I found ... ". However, when the responses relate to just one restaurant, the system might respond "According to ... ", "Check out these results .... " As noted above, the number of restaurants is reduced as the dialogue progresses. When the user asks a first question, N top responses are identified and these correspond to S restaurants. When the user asks a follow-up question, the context vector of the new query is generated and this is compared with the response vectors for the already identified S restaurants.).

Regarding claims 5, 12, and 19 Henderson teaches determining the at least one candidate response to the conversational phrase based upon, at least in part, the first probability for the subset of candidate responses and the second probability for the subset of candidate responses includes interpolating the first probability for the subset of candidate responses and the second probability for the subset of candidate responses (Fig.4c #S309 and Fig.d; Col.9 Lines 32-63 Henderson discloses, in step S309, the responses for each restaurant are grouped and a score qe for each restaurant e∈S is determined. However, in step S309, a weighted query vector is used. As noted above, in this embodiment, the method allows the user to modify their searching goals. For a single query the query vector is the same for both flow chart 4(a) and 4(c). However, if the query is a subsequent query from the user, then instead of maintaining the matching restaurant set as the dialogue state, the method now records the user query (text) across different turns. The user query at each turn is encoded as a query vector, and these query vectors across turns are then weighted and summed (with a decay factor) to form a weighted query vector q'. Decay factor d is a parameter controlling the importance of the previous turns (d=0 means no context is used). FIG. 4(d) is a schematic showing 4 "turns" or queries where first the user asks "Is there a Chinese restaurant?" The query vector is just the embedded vector for the single query. However, if the user then asks a follow-up query. -"Do any of these restaurants have vegetarian options?". A single query vector will be produced here, but it will be a combination of the first and second query vectors. In an embodiment, this is realised by adding the vectors produced from both queries and weighting the latest question more heavily than the first question.).

Claim Rejections - 35 USC § 103


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 6, 7, 13, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Henderson as stated above in view of Rout et al. (US 20210397631 A1), hereinafter referenced as Rout.

Regarding claims 6, 13, and 20 Henderson teaches the computer-implemented method of claim 1, the computer program product of claim 8, and the computing system of claim 15, respectively.
Henderson does not explicitly teach one or more of: concurrently training a first model configured to determine the first probability for the subset of candidate responses and a second model configured to determine the second probability for the subset of candidate responses, together; sequentially training the first model and the second model; and training, in parallel, the first model and the second model.
However, Rout explicitly teaches one or more of: concurrently training a first model configured to determine the first probability for the subset of candidate responses and a second model configured to determine the second probability for the subset of candidate responses, together; sequentially training the first model and the second model; and training, in parallel, the first model and the second model (Para. [0155] Rout discloses, a feature-based similarity model and a deep-learning model may be trained in parallel even if the two trained models are used in a joint manner, such as in the pipelined manner described in FIG. 4. For example, generating the feature-based similarity model and the deep-learning-based similarity model may include performing one or more first model training iterations using the one or more tagged data columns to generate the feature-based similarity model, wherein each first model training iterations of the one or more first model training iterations is configured to update the one or more similarity measure weight values in order to optimize a first model measure of error between first model outputs generated by the feature-based similarity model and ground-truth column relationship data for the one or more tagged data columns; and performing one or more second model training iterations using the one or more tagged data columns to generate the deep-learning-based similarity model, wherein each second model training iterations of the one or more second model training iterations is configured to update one or more image processing weight values of the one or more image processing models in order to optimize a second model measure of error between second model outputs generated by the deep-learning-based similarity model and the ground-truth column relationship data for the one or more tagged data columns. In some of the noted embodiments, the one or more first model training iterations and the one or more second model training iterations are determined independent of each other.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Rout, and apply it to the teachings of Henderson, with a motivation to have two different models performing two different probability computations to improve the response time for outputting the results. 

Regarding claims 7, and 14 Henderson teaches the computer-implemented method of claim 6, and the computer program product of claim 13, respectively. 
Henderson does not explicitly teach wherein concurrently training the first model and the second model together includes one or more of: updating a predefined number of nearest neighbor context-response pairs via a separate computing device; and sharing at least one encoder between the first model and the second model.
However, Rout explicitly teaches wherein concurrently training the first model and the second model together includes one or more of: updating a predefined number of nearest neighbor context-response pairs via a separate computing device; and sharing at least one encoder between the first model and the second model (Para. [0155] Rout discloses, a feature-based similarity model and a deep-learning model may be trained in parallel even if the two trained models are used in a joint manner, such as in the pipelined manner described in FIG. 4. For example, generating the feature-based similarity model and the deep-learning-based similarity model may include performing one or more first model training iterations using the one or more tagged data columns to generate the feature-based similarity model, wherein each first model training iterations of the one or more first model training iterations is configured to update the one or more similarity measure weight values in order to optimize a first model measure of error between first model outputs generated by the feature-based similarity model and ground-truth column relationship data for the one or more tagged data columns; and performing one or more second model training iterations using the one or more tagged data columns to generate the deep-learning-based similarity model, wherein each second model training iterations of the one or more second model training iterations is configured to update one or more image processing weight values of the one or more image processing models in order to optimize a second model measure of error between second model outputs generated by the deep-learning-based similarity model and the ground-truth column relationship data for the one or more tagged data columns. In some of the noted embodiments, the one or more first model training iterations and the one or more second model training iterations are determined independent of each other.).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Rout, and apply it to the teachings of Henderson, with a motivation to have two different models being trained parallelly and being updated.


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAISHAV SHAH whose telephone number is (571)272-3224. The examiner can normally be reached Monday - Friday 7:30 am to 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, AMIR MEHRMANESH can be reached on (571)270-3351. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.K.S./
Examiner, Art Unit 4163

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657