DETAILED ACTION
Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Regarding Korean Patent Application Nos. 10-2020-0046011 and 10-2020-0146332, receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement submitted on 03/31/2022 has been considered by the examiner.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because in line 1, it contains the implied phrase, “Disclosed is a…”, which should be avoided as described above. Correction is required.  See MPEP § 608.01(b).
The following guidelines illustrate the preferred layout for the specification of a utility application. These guidelines are suggested for the applicant’s use.
Arrangement of the Specification
As provided in 37 CFR 1.77(b), the specification of a utility application should include the following sections in order. Each of the lettered items should appear in upper case, without underlining or bold type, as a section heading. If no text follows the section heading, the phrase “Not Applicable” should follow the section heading:
(a) TITLE OF THE INVENTION.
(b) CROSS-REFERENCE TO RELATED APPLICATIONS.
(c) STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT.
(d) THE NAMES OF THE PARTIES TO A JOINT RESEARCH AGREEMENT.
(e) INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A READ-ONLY OPTICAL DISC, AS A TEXT FILE OR AN XML FILE VIA THE PATENT ELECTRONIC SYSTEM.
(f) STATEMENT REGARDING PRIOR DISCLOSURES BY THE INVENTOR OR A JOINT INVENTOR.
(g) BACKGROUND OF THE INVENTION.
(1) Field of the Invention.
(2) Description of Related Art including information disclosed under 37 CFR 1.97 and 1.98.
(h) BRIEF SUMMARY OF THE INVENTION.
(i) BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S).
(j) DETAILED DESCRIPTION OF THE INVENTION.
(k) CLAIM OR CLAIMS (commencing on a separate sheet).
(l) ABSTRACT OF THE DISCLOSURE (commencing on a separate sheet).
(m) SEQUENCE LISTING. (See MPEP § 2422.03 and 37 CFR 1.821 - 1.825). A “Sequence Listing” is required on paper if the application discloses a nucleotide or amino acid sequence as defined in 37 CFR 1.821(a) and if the required “Sequence Listing” is not submitted as an electronic document either on read-only optical disc or as a text file via the patent electronic system.

On page 1 of the instant specification, the “Acknowledgement” section in lines 4-7 should be moved after the “Cross-Reference to Prior Applications” section beginning at line 9.

Claim Objections
Claims 3, 5, and 9-13 are objected to because of the following informalities:  
In claim 3, line 4, “a word to be corrected” should read “the word to be corrected”
In claim 5, line 1, “when a word selected” should read “the candidate word selected”
In claim 9, line 1, “calculating of a distance’ should read “calculating of the distance”
In claim 10, line 1, “calculating of a distance’ should read “calculating of the distance”
In claim 10, line 2, “calculate a distance” should read “calculate the distance”
In claim 11, lines 1-2, “an entire sentence” should read “the sentence”
In claim 11, line 2, “a sentence” should read “the sentence”
In claim 12, line 1, “calculating of a distance’ should read “calculating of the distance”
In claim 12, line 7, “a distance” should read “the distance”
In claim 13, line 7 “(softmax function)” should read “using the [[(]]softmax function[[)]]”
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) [and associated disclosure] are:
“input unit” [Claims 1-7; Fig. 1, input unit 101; page 10, line 19]
“correction target word check unit” [Claims 1-7; Fig. 1, correction target word check unit 102; page 10, line 20]
“candidate editing distance selection unit” [Claims 1-7; Fig. 1, candidate editing distance selection unit 103; page 10, line 22]
“mask candidate generation unit” [Claims 1-7; Fig. 1, mask candidate generation unit 104; page 10, line 24]
“correction word presentation unit” [Claims 1-7; Fig. 1, correction word presentation unit 105; page 10, line 26]
“statistical candidate word set construction unit” [Claim 2; Fig. 2, statistical candidate word set construction unit 21; page 11, line 5]
“context probability calculation unit” [Claim 2; Fig. 2, context probability calculation unit 22; page 11, line 7]
“error word presence/absence determination unit” [Claim 2; Fig. 2, error word presence/absence determination unit 23; page 11, line 9]
“editing distance calculation unit” [Claim 3; Fig. 3, editing distance calculation unit 31; page 11, line 15]
“correction candidate word set construction unit” [Claim 3; Fig. 3, correction candidate word set construction unit 32; page 11, line 17]
“candidate word distance value calculation unit” [Claim 3; Fig. 3, candidate word distance value calculation unit 33; page 11, line 20]

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-7 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 1-7 are also rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. 
Claims 1-7 recite various modules interpreted under 112(f) as noted above.  These claimed elements include:
“input unit” [Claims 1-7; Fig. 1, input unit 101; page 10, line 19]
“correction target word check unit” [Claims 1-7; Fig. 1, correction target word check unit 102; page 10, line 20]
“candidate editing distance selection unit” [Claims 1-7; Fig. 1, candidate editing distance selection unit 103; page 10, line 22]
“mask candidate generation unit” [Claims 1-7; Fig. 1, mask candidate generation unit 104; page 10, line 24]
“correction word presentation unit” [Claims 1-7; Fig. 1, correction word presentation unit 105; page 10, line 26]
“statistical candidate word set construction unit” [Claim 2; Fig. 2, statistical candidate word set construction unit 21; page 11, line 5]
“context probability calculation unit” [Claim 2; Fig. 2, context probability calculation unit 22; page 11, line 7]
“error word presence/absence determination unit” [Claim 2; Fig. 2, error word presence/absence determination unit 23; page 11, line 9]
“editing distance calculation unit” [Claim 3; Fig. 3, editing distance calculation unit 31; page 11, line 15]
“correction candidate word set construction unit” [Claim 3; Fig. 3, correction candidate word set construction unit 32; page 11, line 17]
“candidate word distance value calculation unit” [Claim 3; Fig. 3, candidate word distance value calculation unit 33; page 11, line 20]

As described in the instant disclosure, each of these claimed components are part of a context sensitive spelling error correction device using a masked language model (instant specification, page 9, lines 13-18.)  “The masked language model calculates candidate words to come in a mask based on a context of a sentence with a language model learned at every step with a method of predicting the mask by masking a word of the sentence in a learning document.”  The examiner interprets “masked language model” to require a computer and instructions (e.g., software or firmware) executed by such computer to execute the masked language model.

As to the written description requirement, MPEP 2161.01 I details determining whether there is adequate written description for a computer-implemented functional claim limitation, noting:
Similarly, original claims may lack written description when the claims define the invention in functional language specifying a desired result but the specification does not sufficiently describe how the function is performed or the result is achieved. For software, this can occur when the algorithm or steps/procedure for performing the computer function are not explained at all or are not explained in sufficient detail (simply restating the function recited in the claim is not necessarily sufficient). In other words, the algorithm or steps/procedure taken to perform the function must be described with sufficient detail so that one of ordinary skill in the art would understand how the inventor intended the function to be performed. See MPEP §§ 2163.02 and 2181, subsection IV. (emphasis added.)


	Further as to the written description requirement, MPEP 2163.03 VI details written description circumstances arising from indefiniteness of a means-plus-function limitation, noting:

A claim limitation expressed in means- (or step-) plus-function language "shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof." 35 U.S.C. 112(f)  or pre-AIA  35 U.S.C. 112, sixth paragraph. If the specification fails to disclose sufficient corresponding structure, materials, or acts that perform the entire claimed function, then the claim limitation is indefinite because the applicant has in effect failed to particularly point out and distinctly claim the invention as required by 35 U.S.C. 112(b)  or pre-AIA  35 U.S.C. 112, second paragraph. In re Donaldson Co., 16 F.3d 1189, 1195, 29 USPQ2d 1845, 1850 (Fed. Cir. 1994) (en banc). Such a limitation also lacks an adequate written description as required by 35 U.S.C. 112(a)  or pre-AIA  35 U.S.C. 112, first paragraph, because an indefinite, unbounded functional limitation would cover all ways of performing a function and indicate that the inventor has not provided sufficient disclosure to show possession of the invention. See also MPEP § 2181. (emphasis added.)

As to the enablement requirement, MPEP 2164.06(c) II details determining whether there is adequate enablement for block elements within a computer, noting:
While no specific universally applicable rule exists for recognizing an insufficiently disclosed application involving computer programs, an examining guideline to generally follow is to challenge the sufficiency of disclosures that fail to include the programmed steps, algorithms or procedures that the computer performs necessary to produce the claimed function. These can be described in any way that would be understood by one of ordinary skill in the art, such as with a reasonably detailed flowchart which delineates the sequence of operations the program must perform. In programming applications where the software disclosure only includes a flowchart, as the complexity of functions and the generality of the individual components of the flowchart increase, the basis for challenging the sufficiency of such a flowchart becomes more reasonable because the likelihood of more than routine experimentation being required to generate a working program from such a flowchart also increases.

As stated earlier, once USPTO personnel have advanced a reasonable basis or presented evidence to question the adequacy of a computer system or computer programming disclosure, the applicant must show that the specification would enable one of ordinary skill in the art to make and use the claimed invention without resorting to undue experimentation. In most cases, efforts to meet this burden involve submitting affidavits, referencing prior art patents or technical publications, presenting arguments of counsel, or combinations of these approaches. (emphasis added.)

The claimed “input unit” is illustrated by the process performed by element 101 in Fig. 1 and described further on page 10, lines 19-20, noting that the “input unit” is configured for inputting a sentence for correcting an error.  The specification does not provide sufficient details about how a sentence is inputted into the spelling error correction device using a masked language model or an input unit, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “correction target word check unit” is illustrated by the process performed by element 102 in Fig. 1 and described further on page 10, lines 20-21, noting that the “correction target word check unit” is configured for sequentially checking words of a sentence input through the input unit 101 and checking whether there is an error in the word.  The specification does not provide sufficient details about how words in a sentence are sequentially checked by such a unit to determine if there is an error with the word, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  Moreover, the sub-units 21, 22, and 23 illustrated in Figure 2 similarly do not provide sufficient details as described further below.
The claimed “candidate editing distance selection unit” is illustrated by the process performed by element 103 in Fig. 1 and described further on page 10, lines 22-23, noting that the “candidate editing distance selection unit” is configured for selecting candidate words for a word to be corrected when it is determined that there is an error in the word.  The specification does not provide sufficient details about how candidate words are selected, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “mask candidate generation unit” is illustrated by the process performed by element 104 in Fig. 1 and described further on page 10, lines 24-25, noting that the “mask candidate generation unit” is configured for replacing the word to be corrected with a mask to calculate a distance value between a surrounding context of the mask and the candidate words.  The specification does not provide sufficient details about how words to be corrected are replaced with a mask to calculate a distance value, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  Moreover, the sub-units 31, 32, and 33 illustrated in Figure 3 similarly do not provide sufficient details as described further below.
The claimed “correction word presentation unit” is illustrated by the process performed by element 105 in Fig. 1 and described further on page 10, lines 25-26, noting that the “correction word presentation unit” is configured for presenting a candidate word close to the context.  The specification does not provide sufficient details about how words close to the context are presented, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “statistical candidate word set construction unit” is illustrated by the process performed by element 21 in Fig. 2 and described further on page 11, lines 4-7, noting that the “statistical candidate word set construction unit” is configured for searching a 3-gram dictionary and searching for surrounding context words at a center word position and all appearing statistical candidate words to configure a statistical candidate word set.  The specification does not provide sufficient details about how such a 3-gram dictionary is selected and searched, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “context probability calculation unit” is illustrated by the process performed by element 22 in Fig. 2 and described further on page 11, lines 7-8, noting that the “context probability calculation unit” is configured for calculating a context probability of the statistical candidate word set construction unit 21.  The specification does not provide sufficient details about how such a context probability is calculated, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “error word presence/absence determination unit” is illustrated by the process performed by element 23 in Fig. 2 and described further on page 11, lines 9-11, noting that the “error word presence/absence determination unit” is configured for determining whether there is an error word based on only that an error check target word has a higher or lower context probability than that of the statistical candidate words in the candidate word set.  The specification does not provide sufficient details about how to check for an error word presence or absence, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “editing distance calculation unit” is illustrated by the process performed by element 31 in Fig. 3 and described further on page 11, lines 15-17, noting that the “editing distance calculation unit” is configured for calculating an editing distance with the entire dictionary words of the masked language model based on the word to be corrected.  The specification does not provide sufficient details about how to calculate an editing distance with the entire dictionary words of the masked language model based on the word to be corrected, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “correction candidate word set construction unit” is illustrated by the process performed by element 32 in Fig. 3 and described further on page 11, lines 17-19, noting that the “correction candidate word set construction unit” is configured for obtaining a correction candidate word satisfying an entire insertion word dictionary of the masked language model and a preset editing distance based on a central word.  The specification does not provide sufficient details about how to obtain a correction candidate word, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  
The claimed “candidate word distance value calculation unit” is illustrated by the process performed by element 33 in Fig. 3 and described further on page 11, lines 20-21, noting that the “candidate word distance value calculation unit” is configured for replacing the word to be corrected with a mask to calculate a distance value between a surrounding context of the mask and candidate words.  The specification does not provide sufficient details about how to replace the word to be corrected with a mask, and further does not provide details regarding the claimed elements in the form of an algorithm, software, or other functional code of the computer block disclosure.  

Since these units are disclosed as device block elements, the specification is required to disclose the software, an algorithm, or a flow chart of these operations of the units in sufficient detail.  The specification only discloses the operations of the units at a high level and does not provide the requisite level of detail (e.g., code/algorithm) necessary that would indicate to one of ordinary skill in the art:
That the inventor(s) at the time the application was effectively filed, had possession of the claimed inventor; or
how to make or use the invention without undue experimentation

In sum, claims 1-7 fail to meet the written description requirement of 35 U.S.C. 112a.  The lack of disclosure of the code/algorithms to implement the claimed units (as detailed above) in a manner understandable to a person of ordinary skill in the art results in a failure to reasonably convey that the inventor(s) at the time the application was effectively filed, had possession of the claimed invention.

Further, claims 1-7 fail to meet the enablement requirement of 35 U.S.C. 112a.  The lack of disclosure of the code/algorithms to implement the claimed units (as detailed above) in a manner understandable to a person of ordinary skill in the art results in claimed subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1-13 are generally narrative and indefinite, failing to conform with current U.S. practice.  They appear to be a literal translation into English from a foreign document and are replete with grammatical and idiomatic errors.

Claims 1-7 recite various units interpreted under 35 U.S.C. 112(f) as noted above.
As described in the instant disclosure, each of these claimed components are part of a context sensitive spelling error correction device using a masked language model (instant specification, page 9, lines 13-18.)  “The masked language model calculates candidate words to come in a mask based on a context of a sentence with a language model learned at every step with a method of predicting the mask by masking a word of the sentence in a learning document.”  The examiner interprets “masked language model” to require a computer and instructions (e.g., software or firmware) executed by such computer to execute the masked language model.
As noted above in the rejection under 35 U.S.C. 112a, the specification fails to adequately disclose an algorithm for performing the claimed specific functions for each of these units.  The specification also fails to detail any other structure to perform the claimed functions of these units.  As such, these claimed units are indefinite under 35 U.S.C. 112b due to the specification’s failure to adequately disclose the algorithm or structural details.
“input unit” [Claims 1-7; Fig. 1, input unit 101; page 10, line 19]
“correction target word check unit” [Claims 1-7; Fig. 1, correction target word check unit 102; page 10, line 20]
“candidate editing distance selection unit” [Claims 1-7; Fig. 1, candidate editing distance selection unit 103; page 10, line 22]
“mask candidate generation unit” [Claims 1-7; Fig. 1, mask candidate generation unit 104; page 10, line 24]
“correction word presentation unit” [Claims 1-7; Fig. 1, correction word presentation unit 105; page 10, line 26]
“statistical candidate word set construction unit” [Claim 2; Fig. 2, statistical candidate word set construction unit 21; page 11, line 5]
“context probability calculation unit” [Claim 2; Fig. 2, context probability calculation unit 22; page 11, line 7]
“error word presence/absence determination unit” [Claim 2; Fig. 2, error word presence/absence determination unit 23; page 11, line 9]
“editing distance calculation unit” [Claim 3; Fig. 3, editing distance calculation unit 31; page 11, line 15]
“correction candidate word set construction unit” [Claim 3; Fig. 3, correction candidate word set construction unit 32; page 11, line 17]
“candidate word distance value calculation unit” [Claim 3; Fig. 3, candidate word distance value calculation unit 33; page 11, line 20]
	
Claim 1 is further rejected because the recited limitation, “calculating an editing distance between a word to be corrected and a word dictionary to select a candidate word” is indefinite.  It would be unclear to a person of ordinary skill in the art how an editing distance is calculated between “a word to be corrected” and “a word dictionary” and therefore the metes and bounds of the claim are indefinite.  For example, the edit distance could be calculated based on all words in the “word dictionary”, or could be calculated based on a subset of words in the “word dictionary”, or could be calculated based on a single word in the “word dictionary”. For purposes of examination, this limitation is being interpreted as “calculating an editing distance between a word to be corrected and a word in a word dictionary to select a candidate word”

	Claims 2-7 depend from claim 1 and do not remedy the deficiencies set forth with respect to claim 1 and are therefore rejected under the same grounds as claim 1 above.

Claim 2 is further rejected because claim 2 recites the limitation "the candidate word set" in line 9.  There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation is being interpreted as “the statistical candidate word set”.

Claim 3 is further rejected because claim 3 recites the limitation "the entire word dictionary" in lines 2-3 and lines 4-5.  There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation is being interpreted as “the word dictionary”.

	Claim 3 is further rejected because the recited limitation, “calculates a preset editing distance” is indefinite.  It would be unclear to a person of ordinary skill in the art how a calculated “editing distance” could be “preset”, because if the edit distance is “preset” there is no need to perform a calculation, and therefore the metes and bounds of the claim are indefinite.  For purposes of examination, this limitation is being interpreted as “calculates an  editing distance using a preset editing distance function”. 

Claim 4 is further rejected because claim 4 recites the limitation "the entire dictionary word" in lines 2-3.  There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation is being interpreted as “a word in a dictionary”.

Claim 4 is further rejected because claim 4 recites the limitation "the entire insertion word dictionary of the masked language model" in lines 5-6.  There is insufficient antecedent basis for this limitation in the claim. For purposes of examination, this limitation is being interpreted as “a word dictionary, utilized by the masked language model, that contains one or more inserted words”.

Claim 6 is further rejected because in line 2, w1:n is referred to as “an entire word” when such variable is referred to a sentence (with words from 1:N) in claim 5, from which claim 6 depends.  Claim 6 is therefore indefinite because a word can only be a sentence if the sentence is only 1 word long, yet in claim 5, the sentence must be a minimum of 3 words (w1, … C, … wN).  For purposes of examination, w1:n in claim 6 will be interpreted as referring to a sentence, with words 1:N, where N must be at least 3, consistent with the interpretation of claim 5.

Claim 6 is further rejected because in lines 3-4, C is referred to as “a set C of words selected by the candidate editing distance selection unit is replaced by the mask”, which is inconsistent with the use of the variable C in claim 5, from which claim 6 depends, where the variable C is referred to as a single word in the sentence w1:n.  Therefore, for purposes of examination, this limitation is being interpreted as “a word C in a set of words….”

Claim 6 is further rejected because claim 6 recites the limitation “the context” in line 4. There is insufficient antecedent basis for this limitation in the claim.  For purposes of examination, this limitation is being interpreted as “the entire context around the word to be corrected” as recited in claim 1, from which claim 6 depends.

Claim 7 is further rejected because claim 7 recites the limitation “the calculated distance value” in line 3. There is insufficient antecedent basis for this limitation in the claim.  For purposes of examination, this limitation is being interpreted as “the distance calculation value” as recited in claim 1, from which claim 7 depends.

Claim 7 is further rejected because it is unclear what the recited limitation “the word” in line 3 refers to, because it could refer to “a final correction word” or “a correction candidate word” as recited in claim 7, and could further refer to “a word to be corrected” or a “candidate word”  as recited in claim 1.  Therefore, because the metes and bounds of the recited limitation “the word” in line 3 of claim 7 would be unclear to one of ordinary skill in the art due to various reasonable interpretations for “the word”, claim 7 is indefinite. For purposes of examination “the word” in line 3 is being interpreted as “the final correction word”.

Claim 12 is rejected because claim 12 recites the limitation “the masking sentence” in line 4. There is insufficient antecedent basis for this limitation in the claim.  For purposes of examination, this limitation is being interpreted as “a sentence, where at least one word is masked, and represented by the masksentence variable”

Claim 12 is further rejected because claim 12 recites “and a distance is calculated as a probability using a softmax function”, which is indefinite because it is unclear how a “distance” can be a “probability”, and therefore the metes and bounds of the claim, and the use of the softmax function, would be unclear to one of ordinary skill in the art. 
Where applicant acts as his or her own lexicographer to specifically define a term of a claim contrary to its ordinary meaning, the written description must clearly redefine the claim term and set forth the uncommon definition so as to put one reasonably skilled in the art on notice that the applicant intended to so redefine that claim term. Process Control Corp. v. HydReclaim Corp., 190 F.3d 1350, 1357, 52 USPQ2d 1029, 1033 (Fed. Cir. 1999). The term “softmax” in claim 12 is used by the claim to mean “a function that calculates a distance,” while the accepted meaning is “a function that converts a vector of K real numbers into a probability distribution of K possible outcomes, for example, mapping a vector of candidate words and associated distances to a probability distribution of likelihood of the candidate word being a correction in view of the associated distance.” The term is indefinite because the specification does not clearly redefine the term.  For purposes of examination, “and a distance is calculated as a probability using a softmax function” is interpreted to mean “and a distance is determined using a corresponding probability from the output of a softmax function”.  Furthermore, the candidatedistance variable is being interpreted as a distance, and not a probability, where the distance is determined using the MLM function.

	Claim 13 depends on claim 12 and is therefore rejected under the same grounds as claim 12 above.

Claim 13 is further rejected because in line 2, the limitation “a form of a vector representing all data in deep learning” is indefinite, because it is unclear what “all data in deep learning refers to, and therefore the metes and bounds of the claim would be unclear to one of ordinary skill in the art.
Claim 13 is further rejected because in line 6, the limitation, “a parameter that calculates an optimal result learned in advance” is unclear.  First, it is unclear how a “parameter” of a masked language model could perform a calculation.  Second, calculating a result “learned in advance” is circular, and a person of ordinary skill in the art would not be able to understand how a calculation could perform a result that was learned in advance.
Further, the term “optimal” in claim 13 is a relative term which renders the claim indefinite. The term “optimal” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 
For purposes of examination, “a parameter that calculates an optimal result learned in advance” is being interpreted as “a parameter, that has been selected based on learning to enable the masked language model to output a result”

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Claim 1 recites a device for correcting a context sensitive spelling error using a masked language model. Except for the claimed “masked language model” and use of the term “units,” under the broadest reasonable interpretation, the limitations cover performance in the human mind with the assistance of physical aids (e.g., pen and paper).  
an input unit for inputting a sentence for correction; (e.g., a human reading a sentence and searching for context spelling errors)
a correction target word check unit for checking an input sentence in units of a word and searching for a context spelling error; (e.g., a human reading a sentence and word-by-word searching for context spelling errors)
a candidate editing distance selection unit for calculating an editing distance between a word to be corrected and a word dictionary to select a candidate word; (e.g., a human calculating an editing distance, e.g., “some” and “same” have an edit distance = 1, utilizing a paper list of similar words or a physical word dictionary, and then choosing a candidate word, such as a word with a low edit distance)
a mask candidate generation unit for calculating a distance between an entire context around the word to be corrected and candidate words filtered by the candidate editing distance selection unit; and (e.g., a human masking a word to be corrected (e.g., covering the word with a post-it note), and then calculating a distance, such as a cosine similarity or even an edit distance, between the surrounding context words and the candidate word; alternatively, words can be annotated with values, such as position information, synonyms, etc., and a distance metric can be performed on those annotated values)
a correction word presentation unit for selecting a final correction word based on a distance calculation value.  (e.g., a human selecting a final corrected version of a word, based on the distance calculation performed in the preceding step).
The judicial exception is not integrated into a practical application. While the claim recites a “masked language model”, the claim only recites the model at a high level of generality and the claim does not recite a computer-specific algorithm for training or using the model.  Therefore, the model is a simple computer automation of the process for determining spell corrections of words that could be performed by a human.  Accordingly, these elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Further, the use of the term “units”, does not necessarily mean a computing unit, but to the extent “unit” is construed as a software implementation, such disclosure is merely a high-level generic computing function that does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. For example, BERT is a well-known, routine, and conventional masked language model.  Evidence that BERT is well-known, routine, and conventional includes:
US 20210174016 A1 (Fox) – Para. 0272 (“conventional embeddings generated, like BERT”)
US 20210109995 A1 (Mihindukulasooriy et al.) – Para. 0031 (“One having ordinary skill in the art will also appreciate that the context component 116 can, in various instances, employ any word embedding methodology now known or later created (e.g., Word2Vec, Continuous Bag of Words, Skip Gram, GloVe (Global Vectors), BERT (Bidirectional Encoder Representations from Transformers), other types of neural-network-based or machine-learning-based embedding techniques, and so on”). 
US 20210004439 A1 – (Xiong et al.) – Para. 0023 (“forms of word embedding known in the art could be used as well, such as Bidirectional Encoder Representations from Transformers (BERT).”)

Claims 2-7 depend from claim 1 and do not remedy the deficiencies of claim 1 and is therefore rejected under the same grounds as claim 1 above.  
Claim 2 further recites limitations concerning the claimed “correction target word check unit”, but none of the additional limitations recited in claim 2 amount to anything more than the same or a similar abstract idea as recited in claim 1.  Such limitations merely recite searching a 3-gram dictionary (e.g., looking at a word list or physical dictionary), calculating a statistical candidate word set and context probabilities, and determining whether a word has an error, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
Claim 3 further recites limitations concerning the claimed “candidate editing distance selection unit”, but none of the additional limitations recited in claim 3 amount to anything more than the same or a similar abstract idea as recited in claim 1.  Such limitations merely recite determining an edit distance and using such edit distance to select a word to be corrected from a dictionary, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
Claim 4 further recites limitations concerning the claimed “mask candidate generation unit”, but none of the additional limitations recited in claim 4 amount to anything more than the same or a similar abstract idea as recited in claim 1.  Such limitations merely recite calculating edit distances and distances based on context and using such distances to determine candidate words for spelling error correction, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
Claims 5 and 6 further recite limitations for defining certain variables, function names, set names and values, and provide further details for selecting a word C to be considered for error correction, which amounts to nothing more than the same or a similar abstract idea as recited in claim 1. Such limitations merely recite calculating edit distances and distances based on context and using such distances to determine candidate words for spelling error correction, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
Claim 7 further recites limitations concerning the claimed “correction word presentation unit”, but none of the additional limitations recited in claim 7 amount to anything more than the same or a similar abstract idea as recited in claim 1.  Such limitations merely recite selecting a final correction word based on a calculated distance, which can be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.

	Claim 8 recites a method of correcting a context sensitive spelling error using a masked language mode, where the method corresponds to the device of claim 1, and therefore claim 8 is rejected under the same or similar grounds as claim 1 above.
	Claims 9-13 depend from claim 8 and do not remedy the deficiencies of claim 8 and are therefore rejected under the same grounds as claim 8 above.
Claims 9-11 further recite limitations for calculating edit distances and context distance and using such distances to determine candidate words for spelling error correction, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
Claims 12 and 13 further recite limitations for defining certain variables, function names, set names and values, and further recites a “softmax function”, which amounts to nothing more than the same or a similar abstract idea as recited in claim 1. Such limitations merely recite calculating edit distances and distances based on context and using such distances to determine candidate words for spelling error correction, and using a softmax function, which can all be performed by a person mentally or via pen and paper, and therefore are mere mental processes and abstract ideas.
	In sum, none of dependent claim 2-7 and 9-13 (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer functions (e.g., “units”) amounts to no more than mere instructions to apply the exception using generic computer functions.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, and 3-11 are rejected under 35 U.S.C. 103 as being unpatentable over Sun, Yifu, et al. "Contextual text denoising with masked language models." arXiv preprint arXiv:1910.14080 (2019), pp. 1-5, hereinafter referenced as SUN, in view of Zhang, Tianyi, et al. "Bertscore: Evaluating text generation with bert." arXiv preprint arXiv:1904.09675v.3 (Feb. 24, 2020), pp. 1-43, hereinafter ZHANG.

Regarding claim 1, SUN discloses:
A device for correcting a context sensitive spelling error (proposed text denoising method run on NVIDIA Tesla V100 GPUs; p. 3; section 3; noisy text includes misspelled words; p. 1, section 1) using a masked language model, (text denoising algorithm is based on the BERT masked language model; p. 1, section 1) the device comprising: 
an input unit for inputting a sentence for correction; (denoising algorithm cleans the words in the sentence in sequential order; p. 2, section 2; Algorithm 1 has input variable, noisy sentence x; p. 3, section 2.2)
a correction target word check unit for checking an input sentence in units of a word and searching for a context spelling error; (denoising algorithm cleans every word, e.g., checks each word unit for a context spelling error, in the sentence from left-to-right order by masking them in order, where masking is performed using Word Piece tokens; p. 2, section 2.2; the masked language model predicts the masked words based on the left and right contextual information, e.g., context spelling errors; p. 2, section 2.1; see example in Table 1, where “there is a fat dack swimming in the leake” is de-noised to correct spelling errors “there is a fat duck swimming in the lake”)
a candidate editing distance selection unit for calculating an editing distance between a word to be corrected and a word dictionary to select a candidate word; (list of candidate corrected words Vc, e.g., word dictionary of candidate words, is constructed by the BERT masked language model; pp. 2-3, section 2.2 and Algorithm 1; edit distance between candidate word w and noisy word xj is determined, e.g., editing distance; p. 3, section 2.2 and Algorithm 1)
a correction word presentation unit for selecting a final correction word based on a distance calculation value.  (Likely correct word is selected using the function: 
    PNG
    media_image1.png
    35
    173
    media_image1.png
    Greyscale
, where E(w,xj) is the edit distance between w and noise word xj; p. 3, section 2.2 and algorithm 1)

	However, SUN fails to explicitly teach:
a mask candidate generation unit for calculating a distance between an entire context around the word to be corrected and candidate words filtered by the candidate editing distance selection unit

However, in a related field of endeavor, Zhang discloses the BERTScore language generation metric based on pre-trained BERT contextual embeddings that computes the similarity of two sentences as a sum of cosine similarities between their tokens’ embeddings.  (p.1, section 1). 

	The SUN-ZHANG combination makes obvious:
a mask candidate generation unit for calculating a distance between an entire context around the word to be corrected and candidate words filtered by the candidate editing distance selection unit (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of ZHANG with SUN.  As disclosed in ZHANG, one of ordinary skill in the art would be motivated to utilize the teachings of ZHANG because the ZHANG BERTScore technique addresses common pitfalls with n-gram based metrics by using contextualized embeddings to effectively capture distant dependencies and sentence ordering.  (ZHANG, p. 1, section 1).  One of ordinary skill would further be motivated to utilize the teachings of ZHANG because the ZHANG BERTScore technique has been evaluated to highly correlate with human evaluations in tasks such as image captioning and paraphrasing, demonstrating that it is effectively utilizing the contextual embeddings to robustly predict the similarity between reference and candidate sentences. (ZHANG, p. 1, section 1).  
	Finally, the examiner notes that both SUN and ZHANG utilize the same BERT masked language model (e.g., both cite to Devlin, et al. “BERT: Pre-training of deep
bidirectional transformers for language understanding”), and therefore one of ordinary skill in the art would understand and find it straightforward to utilize SUN and ZHANG together in combination.

Regarding claim 3, SUN in view of ZHANG discloses the device of claim 1.  SUN further discloses:
wherein the candidate editing distance selection unit calculates a preset editing distance for insertion, deletion, and exchange between words in the entire word dictionary using in the masked language model used for correcting context sensitive spelling errors and a word to be corrected to select the corresponding word from the entire word dictionary. (edit distance E(w,xj), e.g., preset editing distance function between word w from a list of candidate words from Vc, e.g., the word dictionary, and noisy word xj, e.g., the word to be corrected, is determined to choose output xi; p. 3, section 2.2 and Algorithm 1)

Regarding claim 4, SUN in view of ZHANG discloses the device of claim 1, including the “wherein the mask candidate generation unit” (see claim 1).  SUN in view of ZHANG further makes obvious:
an editing distance calculation unit for calculating an editing distance with the entire dictionary word of the masked language model based on the word to be corrected; (SUN teaches edit distance E(w,xj), e.g., preset editing distance function between word w from a list of candidate words from Vc, e.g., the word dictionary, and noisy word xj, e.g., the word to be corrected xi; SUN, p. 3, section 2.2 and Algorithm 1; ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; the ZHANG-SUN combination now has BERTScore also utilize the edit distance functionality of SUN to calculate an edit distance at a word-level in addition to calculating a cosine similarity at the sentence-level; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)
a correction candidate word set construction unit for obtaining a correction candidate word that satisfies an editing distance set with the entire insertion word dictionary of the masked language model based on a central word; and (ZHANG discloses list of candidate words Vc is determined by BERT, a masked language model, and that a noisy word xj e.g., a central word, has its distance to the candidate word w from Vc calculated and compared; SUN, p. 3, section 2.2 and Algorithm 1; ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; the ZHANG-SUN combination now has BERTScore layer, which is used in connection with BERT as used in both ZHANG and SUN, evaluate list of candidate words Vc to filter candidate words that satisfy an editing distance threshold, e.g., words must have edit distance < a number to be part of Vc; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)
a candidate word distance value calculation unit for replacing the word to be corrected with a mask to calculate a distance value between a context around the mask and the candidate words. (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

Regarding claim 5, SUN in view of ZHANG discloses the device of claim 1.  SUN in view of ZHANG further makes obvious:
wherein, when a word selected by the candidate editing distance selection unit for an input sentence w1:n is C, (Algorithm 1 has input variable, noisy sentence x, where x is a sentence that can be at least 3 words, such as “there is a fat dack swimming in the leake”, where each word is denoised sequentially, e.g., C can be a candidate word that is not the first or last word in the sentence; SUN, p. 3, section 2.2) the mask candidate generation unit selects C in which a context sensitive spelling error correction distance value 
    PNG
    media_image2.png
    29
    217
    media_image2.png
    Greyscale
is the maximum, and 
 
    PNG
    media_image3.png
    45
    434
    media_image3.png
    Greyscale
is defined. (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence, where a greedy matching algorithm is used to maximize the matching similarity score; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is a masked-version of the reference sentence where the masked token is replaced by words, e.g., C, from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the maximum score correlated to the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN, where the determined word, C^, is the BERTScore determined by the greedy algorithm to maximize the matching similarity score, e.g., argmax function from the set of potential distances considered by BERTScore ; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

Regarding claim 6, SUN in view of ZHANG discloses the device of claim 5.  
wherein in the input sentence, when a word wt to be corrected is masked, an entire word w1:n of the sentence is
    PNG
    media_image4.png
    33
    298
    media_image4.png
    Greyscale
and a set C of words selected by the candidate editing distance selection unit is replaced by the mask, and a distance value between the context and the selected word is calculated. (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, e.g., 
    PNG
    media_image4.png
    33
    298
    media_image4.png
    Greyscale
, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

Regarding claim 7, SUN in view of ZHANG discloses the device of claim 1.  SUN further discloses:
wherein the correction word presentation unit determines a correction candidate word with a highest value among correction candidate words to a final correction word based on the calculated distance value, and presents the word as a substitute. (Algorithm 1 uses the masked language mode to construct candidate list Vc, e.g., a list of correction candidate words, then performs an edit distance calculation on all candidate words in Vc using the function 
    PNG
    media_image1.png
    35
    173
    media_image1.png
    Greyscale
, where E(w,xj) is the edit distance between w (from the candidate list Vc) and noise word xj, where the minimum distance corresponds to the highest likelihood of being the most likely correct word, e.g., highest value among correction candidate words is the final correction words, and then output xi is determined to be the word w with the smallest distance.; pp. 2-3, section 2.2)

Regarding claim 8, SUN discloses:
A method of correcting a context sensitive spelling error using a masked language model, the method comprising: (proposed text denoising method is based on the BERT masked language model, where noisy text includes misspelled words; p. 1, section 1, p. 3; section 3)
determining a spelling error correction target word by checking a sentence in units of a word; (denoising algorithm cleans the words in the sentence in sequential order; p. 2, section 2; Algorithm 1 has input variable, noisy sentence x; p. 3, section 2.2)
selecting a word through calculation of an editing distance between a word to be corrected and dictionary words in a language model to be a candidate word; (list of candidate corrected words Vc, e.g., word dictionary of candidate words, is constructed by the BERT masked language model; pp. 2-3, section 2.2 and Algorithm 1; edit distance between candidate word w and noisy word xj is determined, e.g., editing distance; p. 3, section 2.2 and Algorithm 1)
presenting a final correction word based on ranked information. (Likely correct word is selected using the function: 
    PNG
    media_image1.png
    35
    173
    media_image1.png
    Greyscale
, where E(w,xj) is the edit distance between w and noise word xj; p. 3, section 2.2 and algorithm 1)

However, SUN fails to explicitly teach:
calculating a distance between all selected words to replace a masking place and an entire surrounding context using a sentence masked with the word to be corrected in an input sentence; and

However, in a related field of endeavor, ZHANG discloses the BERTScore language generation metric based on pre-trained BERT contextual embeddings that computes the similarity of two sentences as a sum of cosine similarities between their tokens’ embeddings.  (p.1, section 1). 

The combination of SUN in view of ZHANG makes obvious:
calculating a distance between all selected words to replace a masking place and an entire surrounding context using a sentence masked with the word to be corrected in an input sentence (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of ZHANG with SUN.  As disclosed in ZHANG, one of ordinary skill in the art would be motivated to utilize the teachings of ZHANG because the ZHANG BERTScore technique addresses common pitfalls with n-gram based metrics by using contextualized embeddings to effectively capture distant dependencies and sentence ordering.  (ZHANG, p. 1, section 1).  One of ordinary skill would further be motivated to utilize the teachings of ZHANG because the ZHANG BERTScore technique has been evaluated to highly correlate with human evaluations in tasks such as image captioning and paraphrasing, demonstrating that it is effectively utilizing the contextual embeddings to robustly predict the similarity between reference and candidate sentences. (ZHANG, p. 1, section 1).  
	Finally, the examiner notes that both SUN and ZHANG utilize the same BERT masked language model (e.g., both cite to Devlin, et al. “BERT: Pre-training of deep
bidirectional transformers for language understanding”), and therefore one of ordinary skill in the art would understand and find it straightforward to utilize SUN and ZHANG together in combination.

Regarding claim 9, SUN in view of ZHANG discloses the method of claim 8.  The combination of SUN in view of ZHANG further makes obvious:
wherein the calculating of a distance comprises calculating a distance value of the sentence through the masked language model when words selected through calculation of an editing distance with an entire context of a sentence including a word to be corrected are included in the sentence as correction candidates. (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is a masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, e.g., correction candidates, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

Regarding claim 10, SUN in view of ZHANG discloses the method of claim 8.  SUN in view of ZHANG further makes obvious:
wherein the calculating of a distance comprises inputting a masked sentence to the masked language model to calculate a distance between the selected words and the context and to rank the selected words according to the distance. (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; ZHANG further discloses using BERTScore for ranking; p. 5, section 3, pp. 15-16, appendix A; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN, where the words are ranking according to the ranking as disclosed in ZHANG; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3, and pp. 15-16, appendix A)

Regarding claim 11, SUN in view of ZHANG discloses the method of claim 8, including the “wherein context sensitive spelling error correction of an entire sentence comprises” limitation (see claim 8).  The combination of SUN in view of ZHANG further makes obvious:
checking errors from a first word to a last word of a sentence, (SUN discloses a denoising algorithm that cleans the words in the sentence in sequential order; SUN, pp. 2-3, section 2, Algorithm 1)
obtaining a set of selected candidate words based on a preset editing distance for a word determined to have errors, (SUN discloses a list of candidate corrected words Vc, e.g., set of selected candidate words, is constructed by the BERT masked language model; SUN, pp. 2-3, section 2.2 and Algorithm 1; SUN further discloses determining an edit distance between candidate word w and noisy word xj is determined, e.g., editing distance; SUN, p. 3, section 2.2 and Algorithm 1; SUN further discloses further filtering of a candidate list to select a candidate from the list; SUN, p. 2, section 2; ZHANG discloses prior art edit-distance based metrics, including the Levenshtein edit distance metric with normalization; ZHANG, p. 3, section 2.2; the SUN-ZHANG now adds an output layer to BERT, similar to how ZHANG appends layers to BERT to determine BERTScore, to further filter the list of candidate corrected words Vc of SUN, e.g., set of selected candidate words, where filtering is performed based on an edit distance as disclosed in both SUN and ZHANG, e.g., preset editing distance, where filtering requires a edit distance score below a certain threshold for a noisy word xj, e.g., a word determined to have errors; SUN, p. 1, section 1, pp. 2-3, sections 2-2.2 and Algorithm 1 with ZHANG, p. 1, section 1, p. 2, section 2.2 and pp. 3-5, section 3)
calculating a distance value between the entire context and each candidate word to rank the words, and (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)
presenting a final correction word, and (SUN discloses that the likely correct word, e.g., final correction word, is selected using the function: 
    PNG
    media_image1.png
    35
    173
    media_image1.png
    Greyscale
, where E(w,xj) is the edit distance between w and noise word xj; SUN, p. 3, section 2.2 and algorithm 1; ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN, and now the SUN-ZHANG combination determines an output based on the BERTScore in addition to the output from SUN, e.g., final correction word; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

However, SUN in view of ZHANG does not explicitly teach:
when the final correction word and the word to be corrected are the same, it is determined that there is no error, and when the final correction word and the word to be corrected are different, it is determined that there is an error and the correction word is replaced.

However, the disclosure of Algorithm 1 in SUN is prima facie equivalent to the limitation, “when the final correction word and the word to be corrected are the same, it is determined that there is no error, and when the final correction word and the word to be corrected are different, it is determined that there is an error and the correction word is replaced.”  See MPEP 2183.

Algorithm 1 in SUN provides pseudocode, explaining that input sentence x, represented as a vector with elements xi, is processed by the algorithm, where the output is a de-noised sentence x, where individual element xi is replaced.  The examiner finds that the assignment statement, xi = wc is equivalent to “when the final correction word and the word to be corrected are the same, it is determined that there is no error, and when the final correction word and the word to be corrected are different, it is determined that there is an error and the correction word is replaced.”  If xi == wc  (where “==” is the Boolean operator for equivalency), then the assignment of xi = wc does not change the value assigned to xi and therefore there is no error and the output sentence x is exactly the same as the input sentence x because there was no change in value to xi.  If xi != wc  (where “!=” is the Boolean operator for non-equivalency), then the assignment of xi = wc does change the value assigned to xi and therefore there is an error and the output sentence x is not the same as the input sentence x because there was a change in value to xi.  
The examiner notes that while Algorithm 1 could include a conditional statement prior to line assignment, xi = wc, where, for example, “if (xi != wc) { xi = wc } to clarify that an assignment only takes place if there is an error (e.g., to literally match the claim language), the omission of such an explicit if-statement in the pseudo is functionally equivalent, and insubstantially different, then just doing the assignment itself.

The examiner further notes that SUN refers to github code (see page 4, notes 7-9), and that the authors of ZHANG are from the Department of Computer Science at Cornell University, and therefore finds that a person of ordinary skill in the art would have some knowledge of computer science, pseudocode, and the use of if-statements and variable assignments, which are basic computer science and programming concepts.

Moreover, the examiner finds that it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to modify Algorithm 1 to add a conditional statement prior to line assignment, xi = wc, for example, “if (xi != wc) { xi = wc } and that a person of ordinary skill would be motivated to do so depending on the type of programming language used for implementation, e.g., if using a hardware-efficient static language such as C, the implementer may not want to perform an assignment if possible, and therefore may elect to use a conditional statement, whereas if using a more developer-friendly dynamic language such as Python, the implementer may be ok with improving the readability of code and not using an explicit if-statement in order to reduce the lines of code and because such if-statement is unnecessary and functionality equivalent to the assignment xi = wc.  The examiner further notes that SUN discloses python language code (see page 7, note 9, referencing “pytorch”), and therefore finds that a person of ordinary skill in the art would understand at least the benefits of the Python programming language and conventions.

As stated in MPEP 2183, now that examiner has made a prima facie showing of equivalency, the burden is now on applicant to rebut this showing, as explained in the following excerpt from MPEP 2183:
The burden then shifts to applicant to show that the element shown in the prior art is not an equivalent of the structure, material or acts disclosed in the application. In re Mulder, 716 F.2d 1542, 219 USPQ 189 (Fed. Cir. 1983). No further analysis of equivalents is required of the examiner until applicant disagrees with the examiner’s conclusion, and provides reasons why the prior art element should not be considered an equivalent. See also In re Walter, 618 F.2d 758, 768, 205 USPQ 397, 407-08 (CCPA 1980) (treating 35 U.S.C. 112, sixth paragraph, in the context of a determination of statutory subject matter and noting "[i]f the functionally-defined disclosed means and their equivalents are so broad that they encompass any and every means for performing the recited functions . . . the burden must be placed on the applicant to demonstrate that the claims are truly drawn to specific apparatus distinct from other apparatus capable of performing the identical functions"); In re Swinehart, 439 F.2d 210, 212-13, 169 USPQ 226, 229 (CCPA 1971) (treating as improper a rejection under 35 U.S.C. 112, second paragraph, of functional language, but noting that "where the Patent Office has reason to believe that a functional limitation asserted to be critical for establishing novelty in the claimed subject matter may, in fact, be an inherent characteristic of the prior art, it possesses the authority to require the applicant to prove that the subject matter shown to be in the prior art does not possess the characteristics relied on"); In re Fitzgerald, 619 F.2d 67, 205 USPQ 594 (CCPA 1980) (indicating that the burden of proof can be shifted to the applicant to show that the subject matter of the prior art does not possess the characteristic relied on whether the rejection is based on inherency under 35 U.S.C. 102  or obviousness under 35 U.S.C. 103 ). (emphasis added).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over SUN in view of ZHANG and further in view of Nguyen, Thi-Tuyet-Hai, et al. "Adaptive edit-distance and regression approach for post-OCR text correction." International Conference on Asian Digital Libraries. (2018) pp. 278-289, hereinafter referenced as NGUYEN.

Regarding claim 2, SUN in view of ZHANG discloses the device of claim 1, including the claimed “wherein the correction target word check unit”.  However, SUN in view of ZHANG fails to explicitly teach:
a statistical candidate word set construction unit for searching for a 3-gram dictionary and searching for surrounding context words at a center word position and all appearing statistical candidate words to configure a statistical candidate word set;
a context probability calculation unit for calculating a context probability of statistical candidate words of the statistical candidate word set construction unit; and
an error word presence/absence determination unit for determining whether there is an error word based on whether an error check target word has a higher or lower context probability than that of the statistical candidate words in the candidate word set.

	However, in a related field of endeavor, NGUYEN discloses post-OCR text correction techniques, specifically error correction techniques for generating and ranking candidates for correction.  (pp. 278-279, section 1).

	The SUN-ZHANG-NGUYEN combination makes obvious:
a statistical candidate word set construction unit for searching for a 3-gram dictionary and searching for surrounding context words at a center word position and all appearing statistical candidate words to configure a statistical candidate word set; (NGUYEN discloses generating candidate words and then scoring the candidates utilizing a trained statistical language model and 3-grams, where the training dataset is the “One Billion Word Language Model Benchmark”; NGUYEN, p. 280 section 3.1 and pp. 283-284, section 3.2; ZHANG discloses n-gram matching approaches utilizing a corpus; p. 2, section 2.1; SUN discloses an input sentence and using an n-gram mask, p. 2, section 2.2; the SUN-ZHANG-NGUYEN combination now utilizes NGUYEN to build a statistical language model (SLM), e.g., a statistical candidate word set, utilizing the training set of NGUYEN and n-gram corpus of ZHANG, e.g., the 3-gram dictionary, where the SLM includes probabilities and weights for candidate words from the SUN input at a center position, e.g., the middle word in the 3-gram, and the probabilities of the surrounding context words, e.g., the first and last words in the 3-gram; SUN, p. 2, section 2.2, with ZHANG, p. 2, section 2.1 with NGUYEN, p. 280 section 3.1 and pp. 283-284, section 3.2)
a context probability calculation unit for calculating a context probability of statistical candidate words of the statistical candidate word set construction unit; and (NGUYEN discloses utilizing context probabilities, including probability of 3-length sequences related to errors and probability of n-gram candidates for each word in the statistical language model; NGUYEN, p. 280, section 2 and pp. 284-285, section 3.3; the SUN-ZHANG-NGUYEN combination now utilizes NGUYEN to build a statistical language model (SLM), e.g., a statistical candidate word set, and apply probability of 3-length sequences related to errors and probability of n-gram candidates for each word in the statistical language model, where the candidate words are applied to the SUN input sentence; SUN, p. 2, section 2.2 and Algorithm 1 with NGUYEN, p. 280, section 2 and pp. 284-285, section 3.3)
an error word presence/absence determination unit for determining whether there is an error word based on whether an error check target word has a higher or lower context probability than that of the statistical candidate words in the candidate word set. (NGUYEN discloses that the context probabilities, e.g., probability of 3-length sequences related to errors and probability of n-gram candidates for each word, are used as features to predict the confidence that a candidate word requires correction, where the candidates with the highest confidence rating are suggested for correction; NGUYEN, pp. 284-285, sections 3.3 and 3.4; the SUN-ZHANG-NGUYEN combination now utilizes NGUYEN to build a statistical language model (SLM), e.g., a statistical candidate word set, and further uses the NGUYEN context probabilities and feature ranking to determine candidates with the highest confidence, e.g., an error check target word has a higher context probability than that of the statistical candidate words in the candidate word set, as applied to the input sentence of SUN; SUN, p. 2, section 2.2 and Algorithm 1 with NGUYEN, pp. 284-285, sections 3.3 and 3.4)

Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the present application to utilize the teachings of NGUYEN with SUN and ZHANG.  As disclosed in NGUYEN, one of ordinary skill would be motivated to use the teachings of NGUYEN to perform post-processing in OCR to correct misspellings in the OCR process without requiring human intervention and to further utilize contextual-based approaches remedy the issues with dictionary-lookup approaches.  (pp. 278-279, section 1).  As disclosed in NGUYEN, one of ordinary skill would be further motivated to take advantage of the multi-prong approach of NGUYEN, using multiple models to train a regression model.  (p. 280, section 2).

Claims 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over SUN in view of ZHANG and further in view of Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018), pp. 1-16, hereinafter referenced as DEVLIN.

Regarding claim 12, SUN in view of ZHANG discloses the method of claim 8.  SUN in view of ZHANG further makes obvious:
wherein in the calculating of a distance, a distance value to the context of each candidate word of entire candidate words is obtained as

    PNG
    media_image5.png
    54
    621
    media_image5.png
    Greyscale

by inputting the masking sentence and the N number of correction candidate words using an MLM function (ZHANG discloses utilizing the BERT masked language model with token embeddings and calculating a score, using cosine similarity distance metrics, to compute the similarity between a reference sentence and a candidate sentence; ZHANG, p. 1, section 1, and pp. 3-5, section 3; SUN discloses using the BERT masked language model; SUN, p. 1, section 1; the SUN-ZHANG combination now uses the BERT masked language model with token embeddings, and applies the BERTScore technique of ZHANG, where the reference sentence is the input sentence of SUN and the candidate sentence is masked-version of the reference sentence where the masked token is replaced by words from the list of candidate corrected words Vc, in SUN, and calculates the BERTScore using such reference and candidate sentences, e.g., BERTScore is the distance between context around the masked token and the replacement words for the masked token, e.g., words from list of candidate corrected words Vc in SUN; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3)

However, SUN in view of ZHANG fails to explicitly teach:
candidatedistance is a probability value formed by each candidate extracted from the masked language model and the surrounding context, and a distance is calculated as a probability using a softmax function using when outputting multiple output values up to 1:N as a result, as in an output in the masked language model.

However, in a related field of endeavor, DEVLIN introduces the BERT language model that both SUN and ZHANG use.  The combination of SUN, ZHANG, and DEVLIN makes obvious:
candidatedistance is a probability value formed by each candidate extracted from the masked language model and the surrounding context, and a distance is calculated as a probability using a softmax function using when outputting multiple output values up to 1:N as a result, as in an output in the masked language model. (DEVLIN discloses that the output of the hidden layers are fed into an output softmax as in a standard language model; DEVLIN, p. 4, section 3.1; DEVLIN further discloses fine-tuning applications for BERT which further apply softmax outputs; DEVLIN, p. 5, section 4.1, p. 6, section 4.2, and p. 7, section 4.4; the SUN-ZHANG, DEVLIN combination now takes the spelling error-corrector in SUN which utilizes BERT, with additional output layer to compute a BERTScore relating to the context distance as disclosed in ZHANG, and then further fine-tunes using an output softmax layer as disclosed in DEVLIN, where DEVLIN describes how use of an output softmax layer is “standard” for language models, and where the softmax function outputs variables corresponding to a probability distribution, e.g., outputs multiple output values up to 1:N as a result; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3; with DEVLIN, p. 4, section 3.1, p. 5, section 4.1, p. 6, section 4.2, and p. 7, section 4.4)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of DEVLIN to SUN and ZHANG.  Indeed, both SUN and ZHANG specifically reference and build upon the BERT language model in DEVLIN, so one of ordinary skill would be motivated to review the teachings of DEVLIN and apply the teachings of DEVLIN because SUN and ZHANG are both based on DEVLIN.  Further, the examiner takes official notice that as of 11/3/2022, Google Scholar indicates that DEVLIN has been cited be over 52,000 other articles, and therefore one of ordinary skill in the art would be motivated to review and apply the teachings of DEVLIN given that it is a seminal paper in the masked language modeling space.  

Regarding claim 13, SUN in view of ZHANG and DEVLIN discloses the method of claim 12.  SUN, ZHANG, and DEVLIN further make obvious:
wherein the calculation in the masked language model is configured in a form of a vector representing all data in deep learning, and all input sentence words masksentence and candidate are converted into vectors for calculation and then input, and (SUN discloses in Algorithm 1 that the input and output sentence is vector x, with vector elements xi; SUN, p. 3, section 2.2; SUN further explicitly teaches the use of word vectors; SUN, p. 1, section 1; ZHANG explains that the BERTScore model is based on a token representation, where the embedding model utilizes a sequence of vectors; ZHANG, pp. 3-4, section 3; DEVLIN further explains that hidden layers are represented as vectors; DEVLIN, p. 4, sections 3 and 3.1; the SUN-ZHANG-DEVLIN combination therefore utilizes as input to the masked language model, which is a BERT neural network, a vector representation of the sentence, where the output is also a vector representation as disclosed in SUN; SUN, p. 1, section 1; ZHANG, pp. 3-4, section 3; DEVLIN, p. 4, sections 3 and 3.1)
the masked language model calculates the result for the input sentence in a form of a vector using a parameter that calculates an optimal result learned in advance, and calculates (softmax function) the result as a probability in a final output to output a value in which the sum of the entire result is 1. (DEVLIN discloses that the output of the hidden layers are fed into an output softmax as in a standard language model, where the accepted meaning of a softmax function is to map to an output probability distribution that has a sum of 1, e.g., the result as a probability in a final output to output a value in which the sum of the entire result is 1; DEVLIN, p. 4, section 3.1; DEVLIN further discloses fine-tuning applications for BERT which further apply softmax outputs; DEVLIN, p. 5, section 4.1, p. 6, section 4.2, and p. 7, section 4.4; the SUN-ZHANG, DEVLIN combination now takes the spelling error-corrector in SUN which utilizes BERT, with additional output layer to compute a BERTScore relating to the context distance as disclosed in ZHANG, and then further fine-tunes using an output softmax layer as disclosed in DEVLIN, where DEVLIN describes how use of an output softmax layer is “standard” for language models, and where the softmax function outputs variables corresponding to a probability distribution, e.g., outputs multiple output values up to 1:N as a result; SUN, p. 1, section 1, pp. 2-3, section 2.2 and Algorithm 1 with ZHANG, p. 1, section 1, and pp. 3-5, section 3; with DEVLIN, p. 4, section 3.1, p. 5, section 4.1, p. 6, section 4.2, and p. 7, section 4.4)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to apply the teachings of DEVLIN to SUN and ZHANG.  Indeed, both SUN and ZHANG specifically reference and build upon the BERT language model in DEVLIN, so one of ordinary skill would be motivated to review the teachings of DEVLIN and apply the teachings of DEVLIN because SUN and ZHANG are both based on DEVLIN.  Further, the examiner takes official notice that as of 11/3/2022, Google Scholar indicates that DEVLIN has been cited be over 52,000 other articles, and therefore one of ordinary skill in the art would be motivated to review and apply the teachings of DEVLIN given that it is a seminal paper in the masked language modeling space.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cacho, Jorge Ramón Fonseca, et al. "Using the Google Web 1T 5-gram corpus for OCR Error Correction." 16th Int’l Conf. on Information Technology-New Generations (2019) pp. 505-511.  Discloses the “OCRSpell” tool for spelling correction, and using the Google 5-gram corpus, e.g., a word dictionary, to use context and Levenshtein distance for spell checking and error correction in OCR.
Xie, Haihua, et al. "Automatic chinese spelling checking and correction based on character-based pre-trained contextual representations." CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 2019, pp. 540-549.  Discloses a Chinese-language spelling correction technique based on the BERT masked language model (see page 543).
Hong, Yuzhong, et al. "FASPell: A fast, adaptable, simple, powerful Chinese spell checker based on DAE-decoder paradigm." Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160-169. Discloses a Chinese-language spelling correction technique based on the BERT masked language model (see Abstract).
Wilcox-O’Hearn, et al. "Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model." International conference on intelligent text processing and computational linguistics. 2008, pp. 605-616.  Discloses a spell correcting technique using 3-grams, and further modeling the technique using a large dataset.  (see page 609).
Roy, Shuvendu, et al. "Unsupervised context-sensitive bangla spelling correction with character n-gram." 2019 22nd International Conference on Computer and Information Technology (ICCIT). IEEE, 2019, pp. 1-6.  Discloses a spelling correction technique for n-grams, where 3-grams may be used, and further discloses using cosine similarity distance metrics to perform selling correction.
Islam, Aminul, et al. "Real-word spelling correction using Google Web 1T 3-grams." Proceedings of the 2009 conference on empirical methods in natural language processing. Pp. 1241-1249.  Discloses a technique for spelling correction utilizing the Google 3-gram database, and further disclosing using the longest common subsequence, which is an editing distance technique.  (see section 3).
US 20220188520 A1 (Iso-Sipla et al.) at para. 0111 discloses utilizing the BERT masked language model in conjunction with a Levenshtein edit distance for automatic and semi-automatic entity validation.
US 20200334416 A1 (Vianu et al.) at paras. 0153-0154 discloses utilizing the BERT masked language model for spelling correction as part of an OCR cleaning system 410.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL C. LEE/Examiner, Art Unit 2655                                                                                                                                                                                                        
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655