DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on Feb. 05, 2021 has been entered.

Response to Amendment
Claims 1-20 were previously pending and subject to the final action mailed on Oct. 06, 2020. In the response filed Feb. 05, 2021, claims 1, 5, 15 and 19 were amended. Therefore, claims 1-20 are currently pending and subject to the non-final action below.

Response to Arguments
Applicant's arguments, filed Feb. 05, 2021 regarding claims 1-7 and 15-20 under 35 U.S.C. 103 have been fully considered but they are moot because the arguments do not apply to the new combination of references being used in the current rejection.

Applicant's arguments, filed Feb. 05, 2021 regarding claims 8-14 under 35 U.S.C. 103 have been fully considered but they are not persuasive.
Applicant’s Arguments: Applicant’s recite similar arguments that Bang does not teach "capturing, by the computing device, an image of at least the text field in the user interface... performing, by the computing device, OCR on the image to create a text-recognized image... [and] identifying, by the computing device, the text field and a word in the text field of the text-recognized image, wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text" as recited in claim 1. (see pages 8-9 of applicant’s remarks).
Examiner Response: Examiner respectfully disagrees since the amendments in claim 1 and 15 are not similar for independent claim 8. Independent claim 8 does not recite the amendment limitation of “identifying, by the computing device, the text field and a word in the text field of the text-recognized image, wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text,” as recited in claim 1 and 15. However, the Examiner maintains that Bang in view of Patch teaches the original claim limitation of claim 8 as previously recited the final office action mailed on Oct. 06, 2020.
Bang teaches: capture, via the voice recognition logic, an image of at least the text field in the user interface; (Bang – [0030] [0093] An image capture function that corresponds to a combination of screen pages or a combination of UI elements. [0178-0180] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user. According to an exemplary embodiment, the device 100 may acquire the information of the UI elements included in the screen pages by using information acquired in a rendering process for displaying the screen pages (e.g., execution screens) on a display (hereinafter referred to as `rendering information`).)
perform, via the OCR logic, OCR on the image to identify the editable text field and a word in the editable text field of the user interface; (Bang – [0184-0186] Analyzing text information representing the UI Elements. Extracting text information of newly displayed inputs from viewable and hidden pages. [0231-0233] For example in Fig. 7C, the device 100 may recognized an edit window 709 and recognize the Edit window 709 as a text input window and define the function of the Edit window 709 as ‘Text Input’ and add “message Input Function” to the function information about the third page 730. Since the Edit window 709 includes text, the device 100 may analyzed the meaning of the text including the send button 700. [0286] As illustrated in Fig. 16A-16C, details of the controls for the UI Element and information are illustrated in Figs. 16A-16C. In Fig. 16C includes “EditText”, resource_edit_text. [0350] Fig. 22E is an illustration of identifying the text in Edit window 709 (2208 in Fig. 22E), analyzing the message for inputting a text into the Edit window (709 (2208 in Fig. 22E).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. (US PGPUB: 20160034253, Pub. Date: Feb. 4, 2016 hereinafter “Bang”) in view of Patch (US PGPUB: 20110301943, Pub. Date: Dec. 8, 2011 hereinafter “Patch”) in further view of Bangalore (US PAT: 8,175,230, Pub. Date: May. 8, 2012 hereinafter “Bangalore”).
Regarding independent claim 1, Bang teaches: A method for using optical character recognition (OCR) with voice recognition commands comprising: (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
receiving, by a computing device, a user interface that includes a text field for receiving text input by a user; (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
capturing, by the computing device, an image of at least the text field in the user interface; (Bang – [0030] [0093] An image capture function that corresponds to a combination of screen pages or a combination of UI elements. [0178-0180] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user. According to an exemplary embodiment, the device 100 may acquire the information of the UI elements included in the screen pages by using information acquired in a rendering process for displaying the screen pages (e.g., execution screens) on a display (hereinafter referred to as `rendering information`).)
performing, by the computing device, OCR on the image to create a text-recognized image, (Bang – [0184-0186] Analyzing text information representing the UI Elements. Extracting text information of newly displayed inputs from viewable and hidden pages. [0199] In operation S650, when the UI element selected by the user has image features, the device 100 may perform character recognition (e.g., optical character recognition (OCR)) on the UI element.)
identifying, by the computing device, the text field and a word in the text field of the text-recognized image, (Bang – [0231-0233] For example in Fig. 7C, the device 100 may recognized an edit window 709 and recognize the Edit window 709 as a text input window and define the function of the Edit window 709 as ‘Text Input’ and add “message Input Function” to the function information about the third page 730. Since the Edit window 709 includes text, the device 100 may analyzed the meaning of the text including the send button 700. [0286] As illustrated in Fig. 16A-16C, details of the controls for the UI Element and information are illustrated in Figs. 16A-16C. In Fig. 16C includes “EditText”, resource_edit_text. [0350] Fig. 22E is an illustration of identifying the text in Edit window 709 (2208 in Fig. 22E), analyzing the message for inputting a text into the Edit window (709 (2208 in Fig. 22E).
mapping, by the computing device, a coordinate of the word in the text field; (Bang – [0188] mapping, by the computing device, a coordinate of the word in the text field; [0231] Also, the device 100 may recognize an Edit window 709. For example, the device 100 may detect that the user inputs a text to the Edit window 709, and recognize the Edit window 709 as a text input window. In this case, the device 100 may define the function of the Edit window 709 as `Text Input` and add "Message Input Function" to the function information about the third page 730.)
navigating, by the computing device, a cursor to the coordinate; (Bang – [0209, 0231, 0350] Referring to FIG. 22E, the device 100 may perform a message transmission function by using a fourth description 2207 including function information of UI elements constituting the third page 2250. For example, since the fourth description 2207 includes information defining that a message may be transmitted through an event for inputting a text to an Edit window 2208.)
Bang teaches: receiving, by the computing device, a voice command (Bang – [0128-0129], [0209] [0231]) but does not explicitly teach: to perform an action on or around the word in the text field;
However, Patch teaches: receiving, by the computing device, a voice command to perform an action on or around the word in the text field; (Patch – [0012] Using a speech/voice command and mapping an input function to the speech/voice command. Issuing the speech command through a microphone for controlling an input device such as a mouse, a keyboard, etc. [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object. [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor. Patch teaching actions being perform on and around the text field(s).)
navigating, by the computing device, a cursor to the coordinate; (Patch – [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object.)
and executing, by the computing device, the command on or around the word in the text field. (Patch – [0012] [0014] [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified voice recognition and natural language engine of Bang to provide a formatting and cursor speech recognition commands as taught by Patch, with a reasonable expectation of success since Bang, and Patch are in the same field of endeavor of natural language processing of speech commands. The motivation to combines provides the user the ability to formatting a report based off voice commands in a faster manner of editing multiple text lines in a document.
Bang does not explicitly teach: wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text, 
However, Bangalore teaches: wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text, (Bangalore – [Col. 4 lines 15-30] The analysis of the webpage performed by the form parser allows for further analysis of the webpage. Many forms include required information and non-required information. An example is that to complete a form the webpage may require the user to input his name, address, phone number and an email address. These fields on the webpage are often marked with an asterisk "*" or other means such as color. The form parser 106 can identify the necessary fields by analysis of the text or colors of the webpage or other means to identify the necessary input fields. For example text fields address, city, state and zip code in Fig. 2A.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified voice recognition and natural language engine of Bang and Patch to provide a text field analysis using characteristics of the text for defining areas for entering text within a text field as taught by Bangalore, with a reasonable expectation of success since Bang, Patch and Bangalore are in the same field of endeavor of natural language processing of speech commands. The motivation to combines provides the user the ability to formatting a report based off voice commands in a faster manner of editing multiple text lines in a document.
Regarding independent claim 15, Bang teaches: A non-transitory computer-readable medium for using optical character recognition (OCR) with voice recognition commands that, when executed by a computing device, causes the computing device to perform at least the following: (Bang − [0028] The method may be performed using a non-transitory computer-readable recording medium having recorded thereon a program executable by a computer.)
receive a user interface that includes a text field; (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
capture an image of at least the text field in the user interface; (Bang – [0030] [0093] An image capture function that corresponds to a combination of screen pages or a combination of UI elements. [0178-0180] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user. According to an exemplary embodiment, the device 100 may acquire the information of the UI elements included in the screen pages by using information acquired in a rendering process for displaying the screen pages (e.g., execution screens) on a display (hereinafter referred to as `rendering information`).)
perform OCR on the image to create a text-recognized image: (Bang – [0184-0186] Analyzing text information representing the UI Elements. Extracting text information of newly displayed inputs from viewable and hidden pages. [0199] In operation S650, when the UI element selected by the user has image features, the device 100 may perform character recognition (e.g., optical character recognition (OCR)) on the UI element.)
identify the text field and a word in the text field from the text-recognized image, (Bang – [0231-0233] For example in Fig. 7C, the device 100 may recognized an edit window 709 and recognize the Edit window 709 as a text input window and define the function of the Edit window 709 as ‘Text Input’ and add “message Input Function” to the function information about the third page 730. Since the Edit window 709 includes text, the device 100 may analyzed the meaning of the text including the send button 700. [0286] As illustrated in Fig. 16A-16C, details of the controls for the UI Element and information are illustrated in Figs. 16A-16C. In Fig. 16C includes “EditText”, resource_edit_text. [0350] Fig. 22E is an illustration of identifying the text in Edit window 709 (2208 in Fig. 22E), analyzing the message for inputting a text into the Edit window (709 (2208 in Fig. 22E).
map a coordinate of the word in the text field; (Bang – [0188] mapping, by the computing device, a coordinate of the word in the text field; [0231] Also, the device 100 may recognize an Edit window 709. For example, the device 100 may detect that the user inputs a text to the Edit window 709, and recognize the Edit window 709 as a text input window. In this case, the device 100 may define the function of the Edit window 709 as `Text Input` and add "Message Input Function" to the function information about the third page 730.)
navigate a cursor to the coordinate; (Bang – [0209, 0231, 0350] Referring to FIG. 22E, the device 100 may perform a message transmission function by using a fourth description 2207 including function information of UI elements constituting the third page 2250. For example, since the fourth description 2207 includes information defining that a message may be transmitted through an event for inputting a text to an Edit window 2208.)
Bang teaches: receive a voice command (Bang – [0128-0129], [0209] [0231]) but does not explicitly teach: to perform an action on or around the word;
However, Patch teaches: receive a voice command to perform an action on or around the word; (Patch – [0012] Using a speech/voice command and mapping an input function to the speech/voice command. Issuing the speech command through a microphone for controlling an input device such as a mouse, a keyboard, etc. [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object. [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor.)
navigate a cursor to the coordinate; (Patch – [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object.)
and execute the voice command to perform the action on or around the word in a target application. (Patch – [0012] [0014] [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified voice recognition and natural language engine of Bang to provide a formatting and cursor speech recognition commands as taught by Patch, with a reasonable expectation of success since Bang, and Patch are in the same field of endeavor of natural language processing of speech commands. The motivation to combines provides the user the ability to formatting a report based off voice commands in a faster manner of editing multiple text lines in a document.
Bang does not explicitly teach: wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text, 
However, Bangalore teaches: wherein identifying the text field includes identifying a characteristic of the text field that defines an area into which a user enters text, (Bangalore – [Col. 4 lines 15-30] The analysis of the webpage performed by the form parser allows for further analysis of the webpage. Many forms include required information and non-required information. An example is that to complete a form the webpage may require the user to input his name, address, phone number and an email address. These fields on the webpage are often marked with an asterisk "*" or other means such as color. The form parser 106 can identify the necessary fields by analysis of the text or colors of the webpage or other means to identify the necessary input fields. For example text fields address, city, state and zip code in Fig. 2A.
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified voice recognition and natural language engine of Bang and Patch to provide a text field analysis using characteristics of the text for defining areas for entering text within a text field as taught by Bangalore, with a reasonable expectation of success since Bang, Patch and Bangalore are in the same field of endeavor of natural language processing of speech commands. The motivation to combines provides the user the ability to formatting a report based off voice commands in a faster manner of editing multiple text lines in a document.

Claims 2 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. in view of Patch in view of Bangalore as applied to claims 1, and 15 above, and in further view of Miyazaki et al. (Miyazaki US PAT: 7,801,730, Pub Date: Sep. 21, 2010 hereinafter “Miyazaki”).
Regarding dependents claim 2, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to determine to which of the instances the voice command applies. 
However, Miyazaki teaches: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to  (Miyazaki – [Col. 6 lines 10-43 Determining duplicate voice commands, and if so display a warning to alert user of multiple commands and request the user to further confirm the correct voice command)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to add a behavioral function when voice command corresponds to multiple actions as taught by Miyazaki, with a reasonable expectation of success since Bang, Patch, Bangalore and Miyazaki are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve voice command control in the presence of duplicates.
Regarding dependents claim 16, Bang, Patch and Bangalore discloses all the features with respect to claim 15 as outlined above.
Bang does not explicitly teach: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to determine to which of the instances the voice command applies.
However, Miyazaki teaches: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to determine to which of the instances the voice command applies. (Miyazaki – [Col. 6 lines 10-43 Determining duplicate voice commands, and if so display a warning to alert user of multiple commands and request the user to further confirm the correct voice command)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to add a behavioral function when voice command corresponds to multiple actions as taught by Miyazaki, with a reasonable expectation of success since Bang, Patch, Bangalore and Miyazaki are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve voice command control in the presence of duplicates.

Claims 3-7 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. in view of Patch in view of Bangalore as applied to claims 1, and 15 above, and in further view of Lee et al. (Lee US PGPUB: 20110123115, Pub Date: May. 26, 2011 hereinafter “Lee”).
Regarding dependents claim 3, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: performing post word replacement processing on the word to improve accuracy of the OCR.
However, Lee teaches: performing post word replacement processing on the word to improve accuracy of the OCR. (Miyazaki – [0075-0076] Applicant’s Specification: pst word replacement include spell checking; Lee – [0041] Post OCR accuracy checking includes spell checking)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.
Regarding dependents claim 4, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: performing image manipulation, wherein image manipulation includes at least one of the following: altering a scale factor, altering color bit depth, or filtering the image.
However, Lee teaches: performing image manipulation, wherein image manipulation includes at least one of the following: altering a scale factor, altering color bit depth, or filtering the image. (Lee – [0047] The video processing module 412 optimizes the video stream based on its properties (e.g., size, color, sharpness) and properties of the screen (e.g., resolution, color depth). For example, the video processing module 412 resizes the video stream to fit the screen, tunes the image color based on the color depth of the screen, and/or adjusts other attributes of the video stream such as its sharpness for an optimal display on the screen.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as 
Regarding dependents claim 5, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: wherein identifying the characteristic that defines the area into which the user enters text includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold RGB value, identifying a minimum threshold RGB value, identifying a minimum text field height, identifying a minimum text field width, trimming the text field, or identifying a minimum text field border.
However, Lee teaches: wherein identifying the characteristic that defines the area into which the user enters text includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold RGB value, identifying a minimum threshold RGB value, identifying a minimum text field height, identifying a minimum text field width, trimming the text field, or identifying a minimum text field border. (Lee – [0008] identifying a text region in the video frame associated with the guideline, the text region comprising text; and converting the text in the text region into an editable symbolic form. [0054] the image may be cropped to include only the text (trimming the text area) Bangalore – [Col. 4 lines 15-30])
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and 
Regarding dependents claim 6, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: wherein performing OCR includes utilizing a character whitelist.
However, Lee teaches: wherein performing OCR includes utilizing a character whitelist. (Lee – [0043] a dictionary is a whitelist)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.
Regarding dependents claim 7, Bang, Patch and Bangalore discloses all the features with respect to claim 1 as outlined above.
Bang does not explicitly teach: wherein identifying the text field includes performing at least one of the following: filtering erroneously recognized text or determining a left alignment of text found in the OCR.
However, Lee teaches: wherein identifying the text field includes performing at least one of the following: filtering erroneously recognized text or determining a left alignment of text found in the OCR. (Lee – [abstract] teaches aspects of performing image capture and analysis for OCR.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.
Regarding dependents claim 17, Bang, Patch and Bangalore discloses all the features with respect to claim 15 as outlined above.
Bang does not explicitly teach: performing post word replacement processing on the word to improve accuracy of the OCR.
However, Lee teaches: performing post word replacement processing on the word to improve accuracy of the OCR. (Miyazaki – [0075-0076] Applicant’s Specification: pst word replacement include spell checking; Lee – [0041] Post OCR accuracy checking includes spell checking)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore 
Regarding dependents claim 18, Bang, Patch and Bangalore discloses all the features with respect to claim 15 as outlined above.
Bang does not explicitly teach: performing image manipulation, wherein image manipulation includes at least one of the following: altering a scale factor, altering color bit depth, or filtering the image.
However, Lee teaches: performing image manipulation, wherein image manipulation includes at least one of the following: altering a scale factor, altering color bit depth, or filtering the image. (Lee – [0047] The video processing module 412 optimizes the video stream based on its properties (e.g., size, color, sharpness) and properties of the screen (e.g., resolution, color depth). For example, the video processing module 412 resizes the video stream to fit the screen, tunes the image color based on the color depth of the screen, and/or adjusts other attributes of the video stream such as its sharpness for an optimal display on the screen.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.
Regarding dependents claim 19, Bang, Patch and Bangalore discloses all the features with respect to claim 15 as outlined above.
Bang does not explicitly teach: wherein identify characteristic of the text field that defines the area into which a user enters text includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold RGB value, identifying a minimum threshold RGB value, identifying a minimum text field height, identifying a minimum text field width, trimming the text field, or identifying a minimum text field border.
However, Lee teaches: wherein identify characteristic of the text field that defines the area into which a user enters text includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold RGB value, identifying a minimum threshold RGB value, identifying a minimum text field height, identifying a minimum text field width, trimming the text field, or identifying a minimum text field border. (Lee – [0008] identifying a text region in the video frame associated with the guideline, the text region comprising text; and converting the text in the text region into an editable symbolic form. [0054] the image may be cropped to include only the text (trimming the text area) Bangalore – [Col. 4 lines 15-30])
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech 
Regarding dependents claim 20, Bang, Patch and Bangalore discloses all the features with respect to claim 15 as outlined above.
Bang does not explicitly teach: wherein performing OCR includes utilizing a character whitelist.
However, Lee teaches: wherein performing OCR includes utilizing a character whitelist. (Lee – [0043] a dictionary is a whitelist)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang, Patch and Bangalore to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch, Bangalore and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. (US PGPUB: 20160034253, Pub. Date: Feb. 4, 2016 hereinafter “Bang”) in view of Patch (US PGPUB: 20110301943, Pub. Date: Dec. 8, 2011 hereinafter “Patch”)
Regarding independent claim 8, Bang teaches: A system for using optical character recognition (OCR) with voice recognition commands comprising: (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
a computing device that includes a memory component and a processor, the memory component storing application logic, OCR logic, and voice recognition logic that, when executed by the processor, causes the computing device to perform at least the following: (Bang – [0114] Referring to FIG. 1E, the service module 12 may be implemented as a separate application type (e.g., a service module application 12'). In this case, the service module application 12' may use the API of the platform 13 and may collect rendering information and input event information of UI elements (e.g., type information of input events and identification information of pages called by the input events) from the platform 13. [0639] For example, as illustrated in FIG. 50, the device 100 according to an exemplary embodiment may further include an output unit 130, a communication unit 140, a sensing unit 150, an audio/video (A/V) input unit 160, and a storage (memory) 170 in addition to the user input unit 110 and the controller 120.)
receive, via the application logic, a user interface that includes an editable text field; (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
wherein the application logic provides a target application; (Bang – [0178-0179] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user.)
capture, via the voice recognition logic, an image of at least the text field in the user interface; (Bang – [0030] [0093] An image capture function that corresponds to a combination of screen pages or a combination of UI elements. [0178-0180] Referring to FIG. 5, in operation S510, the device 100 may acquire the information of the UI elements included in the screen pages of the application. As described above, the UI elements may be objects (e.g., texts, images, buttons, icons, and menus) that may be shown to or controlled by the user. According to an exemplary embodiment, the device 100 may acquire the information of the UI elements included in the screen pages by using information acquired in a rendering process for displaying the screen pages (e.g., execution screens) on a display (hereinafter referred to as `rendering information`).)
perform, via the OCR logic, OCR on the image to identify the editable text field and a word in the editable text field of the user interface; (Bang – [0184-0186] Analyzing text information representing the UI Elements. Extracting text information of newly displayed inputs from viewable and hidden pages. [0231-0233] For example in Fig. 7C, the device 100 may recognized an edit window 709 and recognize the Edit window 709 as a text input window and define the function of the Edit window 709 as ‘Text Input’ and add “message Input Function” to the function information about the third page 730. Since the Edit window 709 includes text, the device 100 may analyzed the meaning of the text including the send button 700. [0286] As illustrated in Fig. 16A-16C, details of the controls for the UI Element and information are illustrated in Figs. 16A-16C. In Fig. 16C includes “EditText”, resource_edit_text. [0350] Fig. 22E is an illustration of identifying the text in Edit window 709 (2208 in Fig. 22E), analyzing the message for inputting a text into the Edit window (709 (2208 in Fig. 22E).
map, via the voice recognition logic, a coordinate of the word in the editable text field; (Bang – [0188] mapping, by the computing device, a coordinate of the word in the text field; [0231] Also, the device 100 may recognize an Edit window 709. For example, the device 100 may detect that the user inputs a text to the Edit window 709, and recognize the Edit window 709 as a text input window. In this case, the device 100 may define the function of the Edit window 709 as `Text Input` and add "Message Input Function" to the function information about the third page 730.)
navigate, via the voice recognition logic, a cursor to the coordinate; (Bang – [0209, 0231, 0350] Referring to FIG. 22E, the device 100 may perform a message transmission function by using a fourth description 2207 including function information of UI elements constituting the third page 2250. For example, since the fourth description 2207 includes information defining that a message may be transmitted through an event for inputting a text to an Edit window 2208.)
Bang teaches: receive, via the voice recognition logic, a voice command (Bang – [0128-0129], [0209] [0231]) but does not explicitly teach: to perform an action on or around the word;
However, Patch teaches: receive, via the voice recognition logic, a voice command to perform an action on or around the word; (Patch – [0012] Using a speech/voice command and mapping an input function to the speech/voice command. Issuing the speech command through a microphone for controlling an input device such as a mouse, a keyboard, etc. [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object. [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor.)
navigate, via the voice recognition logic, a cursor to the coordinate; (Patch – [0014] Using a speech/voice command to specify a placement of the cursor to x-y or x-y-z coordinate system to an associated object.)
and execute, via the voice recognition logic, the voice command to perform the action on or around the word in the target application. (Patch – [0012] [0014] [0088] Using a speech/voice command to perform several actions in a single command. For example “3 Lines bold” may select then bold the three lines below the cursor, and “3 Graph Cut” may select then cut the three paragraphs below the cursor.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified voice recognition and natural language engine of Bang to provide a formatting and cursor speech recognition commands as taught by Patch, with a reasonable expectation of success since Bang, and Patch are in the same field of endeavor of natural language processing of speech commands. The motivation to combines provides the user the ability to formatting a report based off voice commands in a faster manner of editing multiple text lines in a document.

Claims 9 is rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. in view of Patch as applied to claim 8 above, and in further view of Miyazaki et al. (Miyazaki US PAT: 7,801,730, Pub Date: Sep. 21, 2010 hereinafter “Miyazaki”).
Regarding dependents claim 9, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang does not explicitly teach: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to determine to which of the instances the voice command applies.
However, Miyazaki teaches: wherein the user interface includes a plurality of instances of the word and wherein, in response to receiving the voice command, indicators are provided on at least two of the plurality of instances of the word to determine to which of the instances the voice command applies. (Miyazaki – [Col. 6 lines 10-43 Determining duplicate voice commands, and if so display a warning to alert user of multiple commands and request the user to further confirm the correct voice command)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to add a behavioral function when voice command corresponds to multiple actions as taught by Miyazaki, with a reasonable expectation of success since Bang, Patch and Miyazaki are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve voice command control in the presence of duplicates.

Claims 10-14 are rejected under 35 U.S.C. 103 as being unpatentable over Bang et al. in view of Patch as applied to claim 8 above, and in further view of Lee et al. (Lee US PGPUB: 20110123115, Pub Date: May. 26, 2011 hereinafter “Lee”).
Regarding dependents claim 10, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang does not explicitly teach: performing post word replacement processing on the word to improve accuracy of the OCR.
However, Lee teaches: performing post word replacement processing on the word to improve accuracy of the OCR. (Miyazaki – [0075-0076] Applicant’s Specification: pst word replacement include spell checking; Lee – [0041] Post OCR accuracy checking includes spell checking)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands
Regarding dependents claim 11, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang 
However, Lee teaches: performing image manipulation, wherein image manipulation includes at least one of the following: altering a scale factor, altering color bit depth, or filtering the image. (Lee – [0047] The video processing module 412 optimizes the video stream based on its properties (e.g., size, color, sharpness) and properties of the screen (e.g., resolution, color depth). For example, the video processing module 412 resizes the video stream to fit the screen, tunes the image color based on the color depth of the screen, and/or adjusts other attributes of the video stream such as its sharpness for an optimal display on the screen.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands
Regarding dependents claim 12, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang does not explicitly teach: identifying the text field includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold RGB value, identifying a minimum threshold RGB value, identifying a minimum text field height, identifying a minimum text field width, trimming the text field, or identifying a minimum text field border.  
However, Lee teaches: identifying the text field includes performing at least one of the following: identifying a target color text field, identifying a maximum threshold (Lee – [0054] the image may be cropped to include only the text (trimming the text area)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands
Regarding dependents claim 13, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang does not explicitly teach: wherein performing OCR includes utilizing a character whitelist.
However, Lee teaches: wherein performing OCR includes utilizing a character whitelist. (Lee – [0043] a dictionary is a whitelist)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands
Regarding dependents claim 14, Bang and Patch discloses all the features with respect to claim 8 as outlined above.
Bang does not explicitly teach: wherein identifying the text field includes performing at least one of the following: filtering erroneously recognized text or determining a left alignment of text found in the OCR.
However, Lee teaches: wherein identifying the text field includes performing at least one of the following: filtering erroneously recognized text or determining a left alignment of text found in the OCR. (Lee – [abstract] teaches aspects of performing image capture and analysis for OCR.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Bang and Patch to include the aspect of performing image capture and analysis via OCR as taught by Lee, with a reasonable expectation of success since Bang, Patch and Lee are in the same field of endeavor of natural language processing of speech commands. The motivation to combines to improve text recognition and reduce errors in voice commands.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL E BARNES JR whose telephone number is (571)270-3395.  The examiner can normally be reached on Monday-Friday 9 am-5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on 571-272-4128.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CARL E BARNES JR/Examiner, Art Unit 2177   

/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2177