DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This action is in response to response filed on 5/17/2022.  This action is FINAL.

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 5-6 8, 10, 12-13, 15, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bird et al. (US 2021/0303989 A1) and further in view of Sheoran et al. (US 2021/0089331 A1).

As per claim 1 (Amended), Bird et al. teaches the invention as claimed including, “A method, comprising:
obtaining training code that includes a plurality of pieces of source code;”
Bird et al. teaches one or more repositories are searched for source code programs written in a target programming language (0039).  A source code extraction component obtains several source code programs  (plurality pf pieces of source code) (0040). Further note (0032), regarding the training phase 302 wherein, word embeddings are generated for each token in the training dataset of the neural model and document embeddings are generated for the list of tokens in each method in the training dataset.
However Bird et al. does not explicitly appear to teach, “sorting the obtained training code according to a popularity associated with each of the pieces of source code, the popularity of a given piece of source code being determined based on one or more factors selected from a list including: a review rating of a given piece of source code, a number of times the given pieces of source code has been included in a watch list, a number of commits of the given piece of source code, a number of pull requests for the given piece of source code, and a number of positive reviews of the given piece of source code;
selecting one or more of the pieces of source code as filtered training code based on the popularity associated with each of the pieces of source code;”
Sheoran et al. teaches a training engine that receives training interaction data that includes survey data.  The survey data indicates a user’s experience rating provided by users of the online platform.  The training interaction data is filtered such that the training engine trains the models of the environment evaluation system using only the training interaction data associated with positive user experiences.  If the experience rating of the survey data are on the scale from 1-10, the training engine may only train the models using the training interaction data with experience rating of 8 or greater (0047).
It would have been obvious to one or ordinary skill in the art before the effective filing date to modify Bird et al. with Sheoran et al. because both teach the training of machine learning models using training data sets.  Sheoran et al. teaches filtering training datasets by user ratings.  This will allow Bird et al. to train its models only with source code that users had a positive experience with.  Filtering training data will help preserve the quality of the training and would have been obvious to try. 
“extracting features from the filtered training code;
storing the extracted features in a code dataset;
mapping, by a deep neural network the extracted features from the filtered training code to natural language code vectors;”
Bird et al. further teaches, the selected source code is processed by a static analyzer to parse each method (extracted feature) into an abstract syntax tree (0041).  The source code analyzer extracts certain keywords (extracted features) from each method such as, method names, method invocations, enums, strings, literals, and comments and the keywords are split up into tokens (0042).  Each token represents a word in the vocabulary off a neural model (deep neural network).  A word embedding is a learned representation for text-based tokens, where words that have a common meaning have a common representation.  Each normalized token is converted (mapped) into a respective word embedding (vector). A neural network is used to generate the word embedding by producing a single vector for each normalized token.  The word embeddings are used to generate a document embedding for each method (0044-0048).  Therefore each method (feature) is mapped to a embedding (vector) using a neural network.   The examiner further states that it would have been inherent for the extracted features to be saved because once they are extracted they must be stored in memory in some fashion to be used to generate the embeddings. Also see 0032 and figure 3.
“receiving a natural language search query for source-code suggestions;
mapping, by the deep neural network, the natural language search query to a natural language search vector;:
	Bird et al. teaches, the system receives a natural language query that is then converted (mapped) to a document query embedding (mapping the natural language query to a search vector) (0057).  Also see 0032 and figure 3.
“comparing the natural language search vector to the natural language code vectors; and
responding to the natural language search query with source code based on the comparing the natural language search vector to the natural language code vectors.”
Bird et al teaches, a search component finds similar (comparing) document embeddings (vectors) from the document embedding database as the document query embedding (vectors) and outputs the correspond code snippets (0032).  Also see 0048, 0058-0059 and figure 3.
As per claim 3 (Amended), Bird et al. further teaches, “The method of claim 1, wherein the extracting features from the filtered training code comprises extracting at least one of code snippets, software documentation, code comments, or test-case logs from the training code.”
The source code analyzer extracts certain keywords from each method such as, method names, method invocations, enums, strings, literals, and comments and the keywords are split up into tokens (0042).  Further note (0032) regarding the training phase 302 wherein word embeddings are generated for each token in the training dataset of the neural model and document embeddings are generated for the list of tokens in each method in the training dataset.
As per claim 5, Bird et al. further teaches, “The method of claim 1, further comprising:
generating code summaries for the extracted features by the deep neural network, the code summaries including natural language descriptions of the extracted features; and
wherein the mapping the extracted features to the natural language code vectors is based on the generated code summaries.”
The selected source code is processed by a static analyzer to parse each method (feature) into an abstract syntax tree (0041).  The source code analyzer extracts certain keywords from each method such as, method names, method invocations, enums, strings, literals, and comments and the keywords are split up into tokens (code summaries) (0042).  Each token (code summaries) represents a word in the vocabulary off a neural model (deep neural network).  A word embedding (vector) is learned representation for text-based tokens where words that have a common meaning have a common representation.  Each normalized token (code summary) is converted (mapped) into a respective word embedding. A neural network is used to generate the word embedding by producing a single vector for each normalized token.  The work embedding are used to generate a document embedding for each method (0044-0048).  
As per claim 6, Bird et al. does not explicitly appear to teach, “The method of claim 1, wherein the comparing the natural language search vector to the natural language code vectors is based on a cosine similarity between the natural language search vector and the natural language code vectors.”
Bird et al. teaches a search component that searches document embedding database to find similar code snippets as the document query embedding.  An approximate nearest neighbor search is used to determine the most similar document embedding based on cosine similarity (0058).  
As per claims 8, 10, 12-13, 15, 17 and 19, these claims contain similar limitations to claims 1, 3, and 5-6.  Therefor claims 8, 10, 12-13, 15, 17 and 19 a rejected for the same reasons as claims 1, 3, and 5-6.
Claims 2, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bird et al. (US 2021/0303989 A1) and Sheoran et al. (US 2021/0089331 A1) as applied to claims 1, 8 and 15 above, and further in view of Moorthi et al. (US 9,898,393 B2) and Vaithiyanathan et al. (US 11, 176,329 B2).
Regarding claim 2 (Amended), Bird et al. teaches that code snippets generated from the neural model are based on a source code repository having a higher-quality source code, typically having been compiled and successfully executed (0021).  However Bird et al. does not explicitly appear to teach, “The method of claim 1, further comprising generating a test-case feature and storing the test-case feature in the code dataset, wherein generating the test-case feature comprises:
parsing the obtained filtered training code;
determining a test-execution environment for the parsed training code based on the parsed training code;
validating execution of the parsed training code based on the test-execution environment; and
generating a test-case feature of the parsed training code, wherein the test-case feature indicates whether the parsed training code is executable in the test-execution environment.
Moorthi et al. teaches, identifying a code repository that stores source code associated with an application to be tested.  Analyze (Parsing) the source code and the at least one test in a test suite.  Generate, automatically, configuration requirements required to execute the test suite based on the analysis of the source code and the at least one test suite.  Determining based on the analysis an execution environment required for the application for the at least one test, and at least a portion of the source code required for executing the at least one test response to the analysis of the source code and the at least one test in the test suite.  Generate instructions for executing a plurality of executions instances, wherein at least some of the plurality of execution instances are configured to define the execution environment based on the configuration requirements generated automatically from the analysis of the source code and the at least one test.  Execute the at least a portion of the source code, and execute the at least one test and receive results from execution of the plurality of execution instances (claim 1).  Also see figure 8. The examiner states that it would have been obvious to one of ordinary skill in the art for the results to be saved.  
Vaithiyanathan et al. teaches a source code analyzer (parser) which determines whether newly provided code is appropriate for storage and future use as source code (column 2, lines 60-65).  Source code may be flagged for human review (column 5, lines 59-63).  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bird et al. with Moorthi et al. and Vaithiyanathan et al.  Bird et al. teaches that code snippets generated from the neural model are based on a source code repository having a higher-quality source code, typically having been compiled and successfully executed (0021).  Vaithiyanathan et al. teaches a source code analyzer (parsing) which determines whether newly provided code is appropriate for storage and future use as source code (column 2, lines 60-65).  Source code may be flagged for human review (column 5, lines 59-63).  Therefore together Bird et al. and Vaithiyanathan et al. teach validating code prior to saving them into the repository.  One would not want to train a neural model with bad code.  However Bird et al. and Vaithiyanathan et al. do not teach the determining of a execution environment and validating execution in the test environment.  This is taught by Moorthi et al. As stated by Brid et al, higher-quality code that has been compiled and executed is stored in the repository.  Together Bird et al, Moorthi et al. and Vaithiyanathan et al.  would allow one to parse source code to determine an execution environment and execute the code in the execution environment to gather results.  This will allow the system to only store validated code in a repository to use for training. 
As per claims 9 and 16, claims 9 and 16 contain similar limitations to claim 2.  Therefore claims 9 and 16 are rejected for the same reasons as claim 2.
Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bird et al. (US 2021/0303989 A1) and Sheoran et al. (US 2021/0089331 A1) as applied to claims 1, 8 and 15 above, and further in view of Pujar et al. (US 2022/0004365 A1).
A per claim 4, Bird et al. does not explicitly appear to teach, “The method of claim 1, wherein the deep neural network comprises at least one of a multilayer perceptron (MLP) network, a long short-term memory (LSTM) network, and an average-stochastic gradient descent weight-dropped long short-term memory (AWD LSTM) network.”
Pujar et al. teaches the use of a long-short-term-memory (LSTM ) model to create contextual embeddings by ingesting tokens (0027 and 0033).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bird et al. with Pujar et al. because both teach machine learning for natural language processing.  Both teach the creation of tokens and converting the tokes into embeddings using a neural network.  Pujar et al. teaches this process using a LSTM model which is a well-known  model and would have been obvious to try. 
As per claims 11 and 18, claims 11 and 18 contain similar limitations to claim 4.  Therefore claims 11 and 18 are rejected for the same reasons as claims 4.
Claims 7, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bird et al. (US 2021/0303989 A1) and Sheoran et al. (US 2021/0089331 A1) as applied to claims 1 and 8 above, and further in view of Trim et al. (US 2022/0012018 A1).
As per claim 7 (Amended), Bird et al. further teaches, “The method of claim 1, 
wherein the mapping the natural language search query to the natural language search vector comprises:”
Bird et al. teaches the system receives a natural language query that is then converted to a document query embedding (mapping the natural language query to a search vector) (0057).  
However Bird et al. does not explicitly appear to teach, “organizing the natural language search query into a first natural language search section and a second natural language search section;” 
Trim et al. teaches a controller that receives a natural language command (0043).  The controller determines one or more tasks that are required to fulfil the command.  The controller identifies the intent of the command in order to determine the tasks by utilizing NLP techniques to determine what the operator intended the generated code to do in order to determine the intent and upon determining the intent the controller may separate this out into individual tasks that code can execute (0044-0045).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Bird et al. with Trim et al. because both teach natural language processing.  Trim et al. teaches that a natural language command can comprise more than one task and a method of determining the plurality of tasks form the natural language command.  Bird et al. teaches the receiving of a single natural language query (task) to find a conde snippet.  Together the natural language query received by Bird et al. can now comprise more than one task/query.  Trim et al. will allow Bird et al. to now process this natural language command into a plurality of tasks/queries therefore allowing Bird et al. to determine a code snippet for each task/query and would have been obvious to try.
“mapping the first natural language search section to a first natural language search vector; and 
mapping the second natural language search section to a second natural language search vector, wherein the comparing the natural language search vector and the natural language code vectors comprises: comparing the first natural language search vector to the natural language code vectors; and 
comparing the second natural language search vector to the natural language code vectors, and 
wherein the source code is further based on the comparing the first natural language search vector to the natural language code vectors, and the second natural language search vector to the natural language code vectors responding to the natural language search query with the source code includes:
identifying a first pieces of source code based on the comparing the first natural language search vector to the natural language code vectors; and
identifying a second piece of source code based on the comparing the second natural language search vector to the natural language code vectors. “
Bird et al. teaches the system receives a natural language query that is then converted to a document query embedding (mapping the natural language query to a search vector) (0057).  Also see 0032 and figure 3.  A search component finds similar document embeddings from the document embedding database as the document query embedding and outputs the correspond code snippets (0032).  Also see 0048, 0058-0059 and figure 3.  
The examiner states that Bird would be able to perform the above mapping and comparing steps for multiple queries and return the code snippet for each query.   As stated above Trim et al. teaches that a natural langue command (query) can comprise multiple tasks (queries).  Therefore, each task (query) will be converted into a document query embedding and used to find similar document embeddings from a document embedding database and output the corresponding code snippet for each task. 
As per claims 14 and 20, claims 14 and 20 contains similar limitations to claim 7.  Therefor claims 14 and 20 a rejected for the same reasons as claim 7.

Response to Arguments
Applicant's arguments filed 5/17/2022 have been fully considered but are moot due to amendments.  The 112(b) rejection for claims 7, 14 and 20 has been lifted due to amendments.  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK A GOORAY whose telephone number is (571)270-7805. The examiner can normally be reached Monday - Friday 10:00am - 6:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARK A GOORAY/               Examiner, Art Unit 2199         

/LEWIS A BULLOCK  JR/               Supervisory Patent Examiner, Art Unit 2199