DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities: Lines 5-6 on
paragraph 0042 recites, “At step 212, the system revises the transcript of the voice input in accordance with the text input.” Which mostly correlates to element 216 as shown on figure 2. Appropriate correction is required.	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 9, 10-13, and 18 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Stead (WO 2007/101089 A1).
Regarding Claim 1, Stead teaches a method for revising a transcription output
from an automatic speech recognition (ASR) system (Reference capable of revising transcription output from an ASR system, paragraphs 42-45, interpreted as the preamble reciting an intended use), the method comprising: 
receiving a voice input from a user (Lines 1-3 on paragraph 0022, one or more conventional mechanisms that permit a user to input information such as a microphone or voice recognition device); 
determining a transcription of the voice input (Lines 1-2 on paragraph 0027, Best hypothesis words determined from ASR 202 which is a transcript of the audio message); 
displaying the transcription of the voice input (Lines 1-2 on paragraph 0027, Transcript displayer 204, displays a transcript of an audio message); 
identifying a portion of the transcription that has a likelihood of transcription error based on the output of one or more models used in determining the transcription and/or based on a gaze of the user (Lines 1-4 on paragraph 0028 and lines 1-3 on paragraph 0032, Portions of the transcription with a likelihood of error are identified with confidence scores, where the ASR 202 uses language and acoustical models for speech recognition); 

receiving a text input from the user via the text input interface indicating a revision to the transcription (Lines 1-8 on paragraph 0042, User may use a mouse to interact with the menu-type error correction to tool to indicate words to be changed); and 
revising the transcription of the voice input in accordance with the text input (Lines 1-10 on paragraph 0053, once an edit is submitted, block 944 on figure 9c, replaces a selected phrase for the replacement phrase entered by the user). 

Regarding Claim 2, Stead teaches the method of claim 1 (see claim 1 above), in addition Stead discloses: 
displaying a graphical indication of the portion of the transcription that has the likelihood of transcription error (Lines 1-9 on paragraph 0028, Graphical indications for words below a predetermined threshold may include gray letters, bolded letters, larger or smaller letters, italicized letters, underlined letter, as well as other visual techniques). 

Regarding Claim 3, Stead teaches the method of claim 1 (see claim 1 above), in addition Stead discloses: 
wherein the model is a general or specialized language model (Lines 1-3 on paragraph 0032, ASR 202 may update its language and acoustical models to improve speech recognition 

Regarding Claim 4, Stead teaches the method of claim 1 (see claim 1 above), in addition Stead discloses: 
wherein the model is an acoustical language model (Lines 1-3 on paragraph 0032, interpretation of acoustical language model is both an acoustic and language model, where the ASR 202 may update its language and acoustical model to improve speech recognition accuracy). 

Regarding Claim 9, Stead teaches the method of claim 1 (see claim 1 above), in addition Stead discloses:
wherein identifying the portion of the transcription that has the likelihood of transcription error is further based on stylus input received from a stylus (Lines 4-6 on paragraph 0030, Selecting input mechanism through pointing device essentially a stylus and Lines 1-5 on paragraph 0031, Stylus input for the select and replace tool).

Regarding Claim 10, Stead teaches a computing system comprising (Lines 1-7 on paragraph 0056, computer-executable instructions include, for example, instructions and data which cause a general purpose computer to perform a certain function or group of functions): 

a processor configured to execute software instructions embodied within the memory (Lines 1-4 on paragraph 0021, Exemplary system includes processor 120 and a memory 130 that stores information and instructions for execution by processor 120). Claim 10 directed to a system claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.

Claim 11 is directed to a system claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2. 

Claim 12 is directed to a system claim corresponding to the method claim presented in claim 3 and is rejected under the same grounds stated above regarding claim 3.

Claim 13 is directed to a system claim corresponding to the method claim presented in claim 4 and is rejected under the same grounds stated above regarding claim 4.

Claim 18 is directed to a system claim corresponding to the method claim presented in claim 9 and is rejected under the same grounds stated above regarding claim 9. Furthermore, a stylus operatively coupled to the processor is interpreted as a processor that is capable of detecting operation of a pointer.

Claims 1-2, 6-8, 10-11, 15-17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Thörn (US 2015/0364140 A1).
Regarding Claim 1, Thörn teaches a method for revising a transcription output from an automatic speech recognition (ASR) system (Lines 1 on paragraph 0068- line 4 on paragraph 0069, Allows user to edit text generated by the speech to text conversion, interpreted as the preamble reciting an intended use), the method comprising: 
receiving a voice input from a user (Lines 1-9 on paragraph 0068, Speech to text conversion module may determine a textual representation of a spoken utterance, so it receives voice input); 
determining a transcription of the voice input (Lines 1-9 on paragraph 0068, Speech to text conversion module may determine a textual representation of a spoken utterance); 
displaying the transcription of the voice input (Lines 19-21 on paragraph 0074, Text from the speech to text conversion module is displayed on display 5); 
identifying a portion of the transcription that has a likelihood of transcription error based on the output of one or more models used in determining the transcription and/or based on a gaze of the user (Lines 1-9 on paragraph 0095, Indicates that the ambiguity i.e. scores for the text outputted by module 123, where models are implied as they are needed for the speech to text conversion module to be operable and serve its purpose, and the use eye gaze activates the text editing function; hence determining that there is a likelihood of error); 
displaying a text input interface either prior to, concurrently with, or after identifying the portion of the transcription that has the likelihood of error (Lines 1-9 on paragraph 0112, 
receiving a text input from the user via the text input interface indicating a revision to the transcription (Lines 1-8 on paragraph 0113, User may be allowed to edit the word by selecting among other candidate words and/or by using textual character input); and 
revising the transcription of the voice input in accordance with the text input (Lines 1-8 on paragraph 0113, User may select other candidate words and/or by using textual character input as to edit the word or interword space with insertion of items such as punctuation marks or other special characters as an example). 

Regarding Claim 2, Thörn teaches the method of claim 1 (see claim 1 above), in addition Thörn discloses:
displaying a graphical indication of the portion of the transcription that has the likelihood of transcription error (Lines 1-4 on paragraph 0107, broken line where the boundary of the activation area may be displayed).

Regarding Claim 6, Thörn teaches the method of claim 1 (see claim 1 above), in addition Thörn discloses:
wherein identifying the portion of the transcription that has the likelihood of transcription error is based on the output of the one or more models and based on the gaze of the user (Lines 3 on paragraph 0095 – line 6 on paragraph 0097 indicates that a score may be associated with words generated from the speech-to-text module 123 where models are 
determining the gaze of the user with a gaze tracker (Interpretation of gaze tracker is anything able to track eye direction and/or movement, Lines 4-6 on paragraph 0036, Gaze tracking device tracks eye gaze direction of a user). 

Regarding Claim 7, Thörn teaches the method of claim 6 (see claim 6 above), in addition Thörn discloses:
wherein the gaze of the user is a plurality of saccades over a given text (Interpretation of saccade are rapid eye directions or movements, Lines 1-9 on paragraph 0114 discusses the user’s eye gaze direction may move rapidly between words), the method further comprising: 
determining the given text as the portion of the transcription that has the likelihood of transcription error (Lines 3 on paragraph 0095 – line 6 on paragraph 0097 indicates that a user may activate a text editing function by an eye gaze directed to the word, hence identifying a particular error in the text). 

Regarding Claim 8, Thörn teaches the method of claim 7 (see claim 7 above), in addition Thörn discloses:
using the one or more models, determining a plurality of text candidates as replacements for the portion of the transcription that has the likelihood of transcription error (Lines 1-8 on paragraph 0113, where models are implied as they are needed for the speech to 
displaying the plurality of text candidates (Lines 1-8 on paragraph 0113, where the candidate words are displayed within the user interface mentioned previously as to perform the text editing function); and
receiving an input from the user selecting one of the plurality of text candidates to replace the portion of the transcription that has the likelihood of transcription error (Lines 1-8 on paragraph 0113, The user is allowed to select among other candidate words for activation areas identified to have a likelihood of error from the conversion module).

Regarding Claim 10, Thörn teaches a computing system comprising (Lines 1-2 on paragraph 0068, Portable electronic equipment, where examples are a mobile phone, a cordless phone, a personal digital assistance (PDA) but not limited thereto, lines 1-3 on paragraph 0146):
a memory (e.g. portable electronic equipment comprises a non-memory storing rules, which are used by the processing device when the text editing function is activated, Lines 1-6 on paragraph 0077; additionally consider the implication of storage in memory by virtue of the teachings of known devices, "cellular telephone," which inherently include stored instructions for execution); and 
a processor configured to execute software instructions embodied within the memory (e.g. portable electronic equipment comprises...a processing device performs processing and control operations... executes/activates functions, Lines 1-13 on paragraph 0076). Claim 10 

Claim 11 is directed to a system claim corresponding to the method claim presented in claim 2 and is rejected under the same grounds stated above regarding claim 2. 

Claim 15 is directed to a system claim corresponding to the method claim presented in claim 6 and is rejected under the same grounds stated above regarding claim 6. 

Claim 16 is directed to a system claim corresponding to the method claim presented in claim 7 and is rejected under the same grounds stated above regarding claim 7. 

Claim 17 is directed to a system claim corresponding to the method claim presented in claim 8 and is rejected under the same grounds stated above regarding claim 8. 

Regarding Claim 19, Thörn teaches A computing system (Lines 1-2 on paragraph 0068, Portable electronic equipment, where examples are a mobile phone, a cordless phone, a personal digital assistance (PDA) but not limited thereto, lines 1-3 on paragraph 0146) comprising:
a memory (e.g. portable electronic equipment comprises a non-memory storing rules, which are used by the processing device when the text editing function is activated, Lines 1-6 on paragraph 0077; additionally consider the implication of storage in memory by virtue of the 
a gaze tracker (Interpretation of gaze tracker is anything able to track eye direction and/or movement, Lines 4-6 on paragraph 0036, Gaze tracking device tracks eye gaze direction of a user); and
a processor configured to execute software instructions embodied within the memory (e.g. portable electronic equipment comprises...a processing device performs processing and control operations... executes/activates functions, Lines 1-13 on paragraph 0076) to:
receive a voice input from a user (Lines 1-9 on paragraph 0068, Speech to text conversion module may determine a textual representation of a spoken utterance, so it receives voice input);
determine a transcription of the voice input (Lines 1-9 on paragraph 0068, Speech to text conversion module may determine a textual representation of a spoken utterance);
display the transcription of the voice input (Lines 19-21 on paragraph 0074, Text from the speech to text conversion module is displayed on display 5);
identify a portion of the transcription that has a likelihood of transcription error based at least on a gaze of the user determined by the gaze tracker (Lines 3 on paragraph 0095 – line 6 on paragraph 0097 indicates that a user may activate a text editing function by an eye gaze directed to the word, hence identifying a particular error in the text, Lines 4-6 on paragraph 0036, Gaze tracking device tracks eye gaze direction of a user);

display a text input interface (Lines 1-9 on paragraph 0112, User interface appears with activation areas from the output of the module; furthermore, where the text editing function may be activated for inputs of revisions);
receive a text input from the user via the text input interface indicating a revision to the transcription (Lines 1-8 on paragraph 0113, User may be allowed to edit the word by selecting among other candidate words and/or by using textual character input); and
revise the transcription of the voice input in accordance with the text input (Lines 1-8 on paragraph 0113, User may select other candidate words and/or by using textual character input as to edit the word or interword space with insertion of items such as punctuation marks or other special characters as an example). 

Regarding Claim 20, Thörn teaches the computing system of claim 19 (see claim 19 above), in addition Thörn discloses:
wherein the processor is configured to identify the portion of the transcription that has the likelihood of transcription error based on an output of one or more models and based on the gaze of the user comprising a plurality of saccades over a given text (Lines 1-9 on paragraph 0075 discusses portable electronic equipment 1 comprises a processing device 4 coupled to the gaze tracking device, where the processing device may be one or more processors to perform processing and control operations, Lines 3 on paragraph 0095 – line 6 on paragraph 0097 
the processor is configured to determine the given text as the portion of the transcription that has the likelihood of transcription error (Lines 1-9 on paragraph 0095, indicates that the ambiguity i.e. scores for the text outputted by module 123 and the use of eye gaze activates the text editing function; hence determining that there is a likelihood of error).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Stead, in view of Thomson (U.S. Patent No. 10,388,272 B1).
Regarding Claim 5, Stead teaches the method of claim 1 (see claim 1 above); however, 
while Stead teaches models such as an acoustical and language model, it is silent in regards to specificity of the language model; therefore, fails to teach wherein the model is a character language model.  
In a related field of endeavor (e.g. editing transcriptions), Thomson discloses systems and methods for reducing the inaccuracy and time required to generate transcriptions with editing capabilities (Lines 55-58 on column 4). Furthermore, Thomson teaches wherein the model used for ASR may be a language model including subword probabilities where subwords may be phonemes, syllables, characters, or other subword units (Lines 56-61 on column 41),  
where ASR system 520 may be an example of the ASR systems 120 of FIG. 1 (Lines 25-29 on column 41). 
Modifying Stead to use the techniques disclosed by Thomson discloses:  
wherein the model is character language model (e.g. Stead’s method for revising a transcription output from an ASR system using an acoustical and language model, now also using the character language model feature as taught by Thomson). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Thomson to the method of Stead.   Including Thomson’s features would have improved Stead by providing a secondary language model that includes subword probabilities i.e. character language model, where it can handle out-of-vocabulary words that were not previously present in the limited language model (Lines 

Claim 14 is directed to a system claim corresponding to the method claim presented in claim 5 and is rejected under the same grounds stated above regarding claim 5.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's
disclosure. 
	Sheeder (CA 3059234 A1) discloses identifying the portion of the transcription that has the likelihood of transcription error is based on the output of the one or more models and based on the gaze of the user through a wearable system that outputs confidence scores and tracks the gaze of the user with a gaze tracker to determine a likelihood of error in a transcription. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination, No single feature or group of features is necessary or indispensable to each and every embodiment. Hardware processor also programmed to receive multimodal inputs for a user interaction and in communication with the sensor and display system.

	Slaney (WO 2016/049439 A1) discloses that ASR transcriptions may include errors in their erroneous words within the transcript; furthermore, it goes into detail on how eye gaze features are determined by a gaze tracker to further determine intent. The multi-modal communication then reduces error rate in identifying visual elements that are intended targets of a user utterance. 

Any inquiry concerning this communication or earlier communications from the examiner
should be directed to JONATHAN E AMAYA HERNANDEZ whose telephone number is (571)272-2484. The examiner can normally be reached Monday - Thursday 7:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/J.E.A. /Examiner, Art Unit 2655  
            
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655