Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the claimed listing filed on 11/25/2019.
Claims 1-20 are pending.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 14 and 20  recite “the” in the limitation “wherein the neural network”.  There is insufficient antecedent basis for this limitation in the claims.  

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective 

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wehr et al., “Learning Semantic Vector Representations of Source Code via a Siamese Neural Network”, 4-2019, Retrieved from https://arxiv.org/abs/1904.11968, pages 1-6.
As per Claim 1: Wehr discloses, 
1. A computer-implemented method comprising:

creating, by one or more computer processors, a dictionary for each source code commit in a set of historical source code commits associated with a software deployment, wherein each dictionary comprises a commit level, an associated defect label, and associated logs;
See p. 2, right column, and in light of the specification [0012], [0023], sec. IV-A,B, p. 3, for Dictionary and Associated logs. For associated with defect label: Sec. IV- C Error Analysis, p. 4. For Commit Level, such as in Fig. 1, f1 implementation, to produces vector v1 (mentioned within sec III, p. 2),  
Where, a commit level, an associated defect label, and associated logs are Data limitations in the claimed recitation

creating, by one or more computer processors, a similarity model based on the created dictionary for each source code commit in the set of historical source code commits;
(In p. 2, sec. III Proposed Model, especially, in right column, “
As is typical in NLP, we limit our token vocabulary to
the most common tokens, replacing the others with a generic
“<UNK>” token. Although [11] argue that such a representation
is inappropriate because of the wide variety in tokens,
we found that a small vocabulary size can cover the majority
of tokens encountered (Section IV-A)..”, and p.2 in the first text portion, “You can also view
the baseline vectors as a histogram of token occurrence.”, and the Figure 1, example, the f1 implementation -->, abstract syntax tree -- > to vector embedding.)

generating, by one or more computer processors, a vector embedding for a source code commit pair based on a set of log differences between source code commit pairs utilizing the created similarity model, 
(In p2, in sec III, within right column, the vectors v1, v2. See Fig.1, p. 3, pair of f1, f2 implementation, - -> abstract syntax tree, -- > vector embedding)

wherein the vector embedding is attached with a defect label, wherein the source code commit pair comprises a dictionary and a subsequent dictionary;
(In p. 2-3, of sec. III, referred to “labeled data”, and “lost function”. In p. 3, left column, f1, f2 implementations are in a unique semantic class F, interpreted “the defect”, as code duplication, lost function, i.e. f1, f2 distance metric using Siamese neural network, and see Section IV B, in p. 3, “Our detection of code duplicates.”, and in p 4 for being association of baseline vectors, a histogram of token occurrence)
generating, by one or more computer processors, responsive to a new source code commit, a new vector embedding based on a set of log differences between the new source code commit and a preceding source code commit utilizing the created similarity model;
(In Section IV A,B referred to baseline vectors, a histogram of token occurrence.
If f1 implementation to produce v1 for vector embedding, then f2 implementation as a new, or code used against the baseline is the new source code commit) 
generating, by one or more computer processors, a defect likelihood utilizing the generated new vector embedding; and
(See  Fig. 1, comparing v1, v2 vector embedding for Cosine Similarity, where defect likelihood
 Interpreted as distances, given in left column, in p.3)
determining, by one or more computer processors, responsive to the generated defect likelihood exceeding a defect likelihood threshold, that the new source code commit contains defects.
(See in Section IV B, in p. 3: “ we calculate the performance at different distance thresholds and plot the result into a receiver operating characteristic (ROC) curve. The area under the curve (AUC) ”

Wehr shows the dictionary for each source code commit in a set of historical source code commits as vocabularies of tokens for vector embedding, but does not explicitly show the collected data in the dictionary as a commit level, an associated defect label, and associated logs, as recited “wherein each dictionary comprises a commit level, an associated defect label, and associated logs”.
However these differences are only found in the nonfunctional descriptive material and are not functionally involved in the steps recited. The vocabularies of histogram tokens of Wehr reference would be performed the same regardless of naming. Thus, this descriptive material will not distinguish the claimed invention from the prior art in terms of patentability, see In re Gulack, 703 F.2d 1381, 1385, 217 USPQ 401, 404 (Fed. Cir. 1983); In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing the invention was made to label the token vocabularies as a commit level, an associated defect label, and associated logs because such data does not functionally relate to the steps/means in the method, product, or system claimed, and because the subjective interpretation of the data does not patentably distinguish the claimed invention.


As per Claim 2: Regarding,
2. The method of claim 1, wherein the defect likelihood represents a probability that a subsequent commit retains errors, bugs, and defects associated with the historical source code commit.
(See in Sec III, p. 2-3, especially, top text portion, in left column of p. 3, the lost function penalizes on high or low distance reads on  recitation, “defect likelihood represents a probability”) 

As per Claim 3: Regarding,
3. The method of claim 1, further comprising: 
sending, by one or more computer processors, an action based on the generated defect likelihood.
1 and f2 based on vector embedding as in Fig.1 p. 3 )

As per Claim 4: Regarding,
4. The method of claim 3, wherein the action is a defect notification that includes a defect severity, the generated defect likelihood, affected source code commits, and one or more related solutions.
(i.e. produce a distance metric of Cosine Similarity s of f1 and f2 based on vector embedding as in Fig.1, p. 3, included with evaluation of sec. IV B)

As per Claim 5: Regarding,
5. The method of claim 1, wherein the similarity model is a neural network.
(See Fig. 1: Structure of Siamese neural network)

As per Claim 6: Regarding,
6. The method of claim 5, wherein the neural network is a siamese network.
(See Fig. 1: Structure of Siamese neural network)


As per Claim 8: Regarding,
8. The method of claim 1, wherein generating the defect likelihood utilizing the generated new vector embedding, comprises:
generating, by one or more computer processors, a similarity score based on one or more comparisons between each historical vector embedding in a plurality of historical vector embeddings and the generated new vector embedding, wherein the similarity score is a numerical value denoting a level of similarity between the compared vector embeddings; and
(See Fig. 1, and Sec. IV B with referred to Table 1)
generating, by one or more computer processors, defect likelihood of the detected new commit utilizing generated similarity score, historical fixes, corrections, pull requests, and bug reports, wherein the defect likelihood is a probability that a commit contains defects associated with a historical commit.
(See Fig. 1, and referred to lost function based on high and low distance resulted from f1 and baseline of Table 1)

As per Claims 9-14: Claims are directed to a product having claimed functionality corresponding to the limitations recited in claims 1-6. The rejection of claims would be provided with rationales addressed in claim 1-6.

As per Claims 15-20: Claims are directed to a system having claimed functionality corresponding to the limitations recited in claims 1-6. The rejection of claims would be provided with rationales addressed in claim 1-6.

Allowable Subject Matter
Claim 7 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
 	 
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Henkel et al., “Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces”, 2018, ACM, pages 163-172, uses a dictionary of word tokens for transforming to vectors for capturing syntax and semantic regularities in program code, and encodes them in the cosine distance between words.  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ted T Vo whose telephone number is (571)272-3706.  The examiner can normally be reached on 8am-4:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Zhen can be reached on (571) 272-3708.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

TTV
December 29, 2021
/Ted T. Vo/
Primary Examiner, Art Unit 2191