DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on December 6, 2019 and July 2, 2020 are being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 3-5 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 3-5 each describe “a screen displayed on the display” and “a segment in which the voice is inputted to the voice input device.” Further, claims 3-5 each depend from claim 2, which also discloses “a screen displayed on the display” and “a segment in which the voice is inputted to the voice input device.” Each of these elements appears to refer to the same general structure. However, based on usage in the claim, each of these elements include the possibility of referring to a member of a plurality of elements (e.g., one screen of a possible plurality of screens), which 
Examiner recommends that the Applicant clarify the antecedent basis (such as through the use of the words “the” or “said”) for all subsequent elements which should be understood as the same element throughout the claims and clearly distinguish elements which should be understood as different elements (such as through the use of numerical indicators, e.g., “first” and “second”), in light of specification support. 
For examination purposes, “a screen…” and “a segment…”, as used in claims 3-5, are being treated as the same element as the respective element of claim 2.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 7-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saito (U.S. Pat. App. Pub. No. 2018/0241899, hereinafter Saito) in view of Manabe (JP Pub. No. 2014112129A, hereinafter Manabe).

Regarding claim 1, Saito discloses An image processing system comprising (“An image forming system that serves as an information processing system”; Saito, ¶¶ [0017]): a touch sensor that receives a manual operation by an operator (“A user interface (UI) section 20 includes a display section and an operation section. The display section is a display device such as a liquid crystal display. The operation section is an input device such as a touch screen {touch sensor} and a keyboard, and receives various operations including a remote operation { that receives a manual operation by an operator}.”; Saito, ¶¶ [0026]); a voice input device that obtains voice inputted by an operator and converts the inputted voice into voice data (“A sound detection section 22 is a sound collector such as a microphone, and includes a function of detecting a sound. For example, when a user (e.g. a customer) who uses the image forming device 10 utters a voice, the voice is detected by the sound detection section 22.”; Saito, ¶¶ [0027]); a voice recognizer that recognizes voice from the voice data and outputs a recognition result (“the voice is detected by the sound detection section 22” where detecting the voice, as used herein, is speech/voice recognition {see the example described in paragraphs [0048]-[0049] “the user (e.g. an operator at the customer center) of the terminal device 12 utters a voice for transferring the operation authority to the image forming device 10 (S01). For example, the operator utters a voice such as “Go ahead”, “Press”, and “Please”.” where “When the sound detection section 32... detects... “Go ahead” described above) (S02), the remote operation section 36 causes the UI section 30 of the terminal device 12 to display information that indicates a transfer of the operation authority (S03)” thus speech recognition}; Saito, ¶¶ [0027], [0048]-[0049]); and an image forming apparatus that comprises a hardware processor that (“The image forming system includes an image forming device 10 and a terminal device 12,” where “Each of the image forming device 10 and the terminal device 12 is implemented through cooperation between hardware resources and software... includes one or more processors such as central processing units (CPUs),” and where the processor is a “hardware resource.”; Saito, ¶¶ [0017], [0078]). However, Saito fail(s) to expressly recite executes processing associated in 
Manabe teaches “a control device which does not need a user operation in order to enable a voice operation.” (Manabe, Problem). Regarding claim 1, Manabe teaches executes processing associated in advance with the recognition result (“the user operates (eg, presses) the utterance button 4 in S10. As a result, the voice input operation start instruction information is transmitted from the voice operation activation unit 4 to the operation state determination unit 5. The operation state determination unit 5 stores the transmitted voice input operation start instruction information in the memory 50 and transmits it to the voice input start / end determination unit 3.” As voice input performs a specific function (operates an utterance button) based on the receipt of the voice input, the function {processing} is associated in advance with the voice input {recognition result}, and the voice input is derived from “The voice recognition execution unit 2…[which] executes a process for recognizing a voice uttered by a user {the recognition result},” and the predetermined speech button is operated {executed}; Manabe, ¶¶ [0028], [0016]); and determines whether the manual operation is being received, and disables the inputted voice upon determining that the manual operation is being received (“When receiving the manual operation start instruction information” from a “user perform[ing] a desired manual operation on the manual operation unit 6,” “the operation state determination unit 5 stores the same information in the memory 50 {determines whether manual operation is being received} and transmits it to the voice input start / end determination unit 3,” where  “the voice input start / end determination unit 3 receives the manual operation start instruction information from the operation state determination unit 5, and interrupts the voice input operation function in the voice recognition execution unit 2 {disables the inputted voice upon determining that the manual operation is being received}”; Manabe, ¶¶ [0030]-[0031]).
Saito to incorporate the teachings of Manabe to include executes processing associated in advance with the recognition result; and determines whether the manual operation is being received and disables the inputted voice upon determining that the manual operation is being received. By automatically starting and stopping the audio operation based on manual operation “High convenience can be realized… [without requiring] user operation,” as recognized by Manabe. (Manabe, ¶ [0008]).

Regarding claim 2, the rejection of claim 1 is incorporated. Saito disclose all of the elements of the current invention as stated above. However, Saito fail(s) to expressly recite further comprising: a display, wherein the hardware processor: determines whether a segment in which the voice is inputted to the voice input device is a voice input inhibition segment, based on information on a screen displayed on the display at a time when the voice input device obtains the inputted voice, and determines that the manual operation is being received when determining that the segment is the voice input inhibition segment. 
The relevance of Manabe is described above with relation to claim 1. Regarding claim 2, Manabe teaches further comprising: a display, (“ The display unit displays operation items by the first input unit and the second input unit”; Manabe, ¶¶ [0009]) wherein the hardware processor: determines whether a segment in which the voice is inputted to the voice input device is a voice input inhibition segment, (The system includes “an interruption means for interrupting the function of the first input means when the second input means accepts an input by manual operation while the function of the first input means is activated” and “when the second input means stops accepting input by manual operation and a predetermined time has elapsed, the function of the first input means is activated again,” where the system prevents “the function of the first input means” {voice input} until the predetermined time has elapsed” {voice input  Manabe, ¶¶ [0008]) based on information on a screen displayed on the display at a time when the voice input device obtains the inputted voice (the determination of the predetermined time is based on “whether the function of the first input unit is activated or stopped {the information}” which “the display unit displays in a visible manner {on the screen displayed on the display}” and “the user is appropriately informed whether the voice input function is being activated or stopped.” Being “appropriately informed” in this context indicates that the information is provided at a time which is appropriate to the receipt of the voice input {at a time when the voice input device obtains the inputted voice} such that the system can “suppress user operation errors.”; Manabe, ¶¶ [0009]), and determines that the manual operation is being received when determining that the segment is the voice input inhibition segment (“in S30, the user performs a desired manual operation on the manual operation unit 6 while referring to the operation menu list displayed in S20,” and “Subsequently, in S40, the voice input start / end determination unit 3 receives the manual operation start instruction information from the operation state determination unit 5, and interrupts the voice input operation function in the voice recognition execution unit 2,” where when the “the elapsed time without the manual operation” exceeds the “predetermined time” {thus, the time frame for the predetermined time is the voice input inhibition segment} voice input can resume.; Manabe, ¶¶ [0031]-[0033])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito to incorporate the teachings of Manabe to include further comprising: a display, wherein the hardware processor: determines whether a segment in which the voice is inputted to the voice input device is a voice input inhibition segment, based on information on a screen displayed on the display at a time when the voice input device obtains the inputted voice, and determines that the manual operation is being received when determining that the segment is the voice input inhibition segment. By automatically starting and stopping the audio operation based on manual Manabe. (Manabe, ¶ [0008]).

Regarding claim 3, the rejection of claim 2 is incorporated. Saito disclose all of the elements of the current invention as stated above. However, Saito fail(s) to expressly recite wherein if a screen displayed on the display at the time when the voice input part obtains the inputted voice is a job- execution-related screen that accepts an operation related to execution of a job inputted to the image processing system, the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment.
The relevance of Manabe is described above with relation to claim 1. Regarding claim 3, Manabe teaches wherein if a screen displayed on the display at the time when the voice input part obtains the inputted voice is a job- execution-related screen (“When the user presses the down key 62 (the example of S30 in FIG. 3) while the second display is being displayed, the display transitions to the third display,” where the third display is a job-execution related-screen {the screen implements numerous touch screen functions for interaction with a user, thus job-execution-related screen}. The system further considers voice operation from a user to be disabled (“icon display indicating that the voice operation is disabled.”) thus indicating that the contingency of the receipt of voice input while the third display is presented and receiving manual input {thus, at the time when the voice input obtains the inputted voice}; Manabe, ¶¶ [0046], [0010]) that accepts an operation related to execution of a job inputted to the image processing system (The third display, as described with relation to the previous element, is operable based on “the manual operation.” Thus, the third display {a screen displayed on a display} accepts a manual operation {“the manual operation is performed by the user…” indicated as pressing a down key 62} to perform a display transition {an operation related to the execution of a job} and is performed by the user {inputted to the image processing system}.”; Manabe, ¶¶  the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment (“Since the manual operation is performed by the user” on the third display {job-execution-related screen} “the voice operation is disabled, and the third display displays that information to the user,” where “the icon display 95 displays an icon including a picture or character indicating that voice operation is impossible in the vicinity of the menu.” establishing the predetermined time {voice input inhibition segment}.; Manabe, ¶¶ [0046]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito to incorporate the teachings of Manabe to include wherein if a screen displayed on the display at the time when the voice input part obtains the inputted voice is a job- execution-related screen that accepts an operation related to execution of a job inputted to the image processing system, the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment. By automatically starting and stopping the audio operation based on manual operation “High convenience can be realized… [without requiring] user operation,” as recognized by Manabe. (Manabe, ¶ [0008]).

Regarding claim 7, the rejection of claim 2 is incorporated. Saito and Manabe disclose all of the elements of the current invention as stated above. Saito further discloses wherein even when determining that the manual operation is being received (The system includes “A user interface (UI) section 20 includes a... display section... such as a liquid crystal display [and an] operation section...such as a touch screen” where operation on a touch screen of UI section 20 is manual operation being received; Saito, ¶¶ [0026], [0028]), the hardware processor enables the inputted voice if the voice data is of a second exception command related to an operation of a screen being displayed on the display (“at least one of operations on the UI section 20 may be invalidated when still another sound (corresponding to an example of the third Saito, ¶¶ [0028], [0039]). 

Regarding claim 8, the rejection of claim 2 is incorporated. Saito and Manabe disclose all of the elements of the current invention as stated above. Saito further discloses wherein even when determining that the manual operation is being received (The system includes “A user interface (UI) section 20 includes a... display section... such as a liquid crystal display [and an] operation section...such as a touch screen” where operation on a touch screen of UI section 20 is manual operation being received; Saito, ¶¶ [0026], [0028]), the hardware processor enables the inputted voice if the voice data is of a third exception command that inquires a state of the image processing system (“at least one of operations on the UI section 20 may be invalidated when still another sound (corresponding to an example of the third specific sound) is detected in the terminal device 12” where, in response to the third specific sound, “the remote operation section 36 may validate operations on the UI section 30 of the terminal device 12 and invalidate operations on the UI section 20 of the image forming device 10 which have been validated,” and “The third specific sound may be a voice about a transfer of the operation authority from the image forming device 10 to the terminal device 12,” where “about a transfer of operation authority” is an inquiry about the state of the image processing system {specifically, the authority for operation of the image processing system}.; Saito, ¶¶ [0028], [0039]).

Regarding claim 9, Saito discloses An image forming apparatus comprising a hardware processor that (“The image forming system includes an image forming device 10 and a terminal device 12,” where “Each of the image forming device 10 and the terminal device 12 is Saito, ¶¶ [0017], [0078]): receives manual operation inputted by an operator (“A user interface (UI) section 20 includes a display section and an operation section. The display section is a display device such as a liquid crystal display. The operation section is an input device such as a touch screen {touch sensor} and a keyboard, and receives various operations including a remote operation { that receives a manual operation by an operator}.”; Saito, ¶¶ [0026]); forms an image on a recording material (“An image forming section 16 performs an image forming process. For example, the image forming section 16 executes at least one of a scan function, a print function, a copy function, and a facsimile function {forming an image on a recording material},”; Saito, ¶¶ [0026]); [produces]…a recognition result outputted by a voice recognizer (“the voice is detected by the sound detection section 22” where detecting the voice, as used herein, is speech/voice recognition {see the example described in paragraphs [0048]-[0049] “the user (e.g. an operator at the customer center) of the terminal device 12 utters a voice for transferring the operation authority to the image forming device 10 (S01). For example, the operator utters a voice such as “Go ahead”, “Press”, and “Please”.” where “When the sound detection section 32... detects... “Go ahead” described above) (S02), the remote operation section 36 causes the UI section 30 of the terminal device 12 to display information that indicates a transfer of the operation authority (S03)” thus speech recognition}; Saito, ¶¶ [0027], [0048]-[0049]); from a voice data obtained by converting an inputted voice (“A sound detection section 22 is a sound collector such as a microphone, and includes a function of detecting a sound. For example, when a user (e.g. a customer) who uses the image forming device 10 utters a voice, the voice is detected by the sound detection section 22.”; Saito, ¶¶ [0027]) However, Saito fail(s) to expressly recite executes processing associated in advance with a recognition result outputted by a voice recognizer; and determines whether the manual operation is being 
The relevance of Manabe is described above with relation to claim 1. Regarding claim 9, Manabe teaches executes processing associated in advance with the recognition result (“the user operates (eg, presses) the utterance button 4 in S10. As a result, the voice input operation start instruction information is transmitted from the voice operation activation unit 4 to the operation state determination unit 5. The operation state determination unit 5 stores the transmitted voice input operation start instruction information in the memory 50 and transmits it to the voice input start / end determination unit 3.” As voice input performs a specific function (operates an utterance button) based on the receipt of the voice input, the function {processing} is associated in advance with the voice input {recognition result}, and the voice input is derived from “The voice recognition execution unit 2…[which] executes a process for recognizing a voice uttered by a user {the recognition result},” and the predetermined speech button is operated {executed}; Manabe, ¶¶ [0028], [0016]); and determines whether the manual operation is being received, and disables the inputted voice upon determining that the manual operation is being received (“When receiving the manual operation start instruction information” from a “user perform[ing] a desired manual operation on the manual operation unit 6,” “the operation state determination unit 5 stores the same information in the memory 50 {determines whether manual operation is being received} and transmits it to the voice input start / end determination unit 3,” where  “the voice input start / end determination unit 3 receives the manual operation start instruction information from the operation state determination unit 5, and interrupts the voice input operation function in the voice recognition execution unit 2 {disables the inputted voice upon determining that the manual operation is being received}”; Manabe, ¶¶ [0030]-[0031]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito to incorporate the teachings of Manabe to include executes processing associated in advance with Manabe. (Manabe, ¶ [0008]).

Regarding claim 10, Saito discloses A voice input inhibition determination method comprising (The method performed by the “image forming system that serves as an information processing system”; Saito, ¶¶ [0017]): receiving a manual operation by an operator (“A user interface (UI) section 20 includes a display section and an operation section. The display section is a display device such as a liquid crystal display. The operation section is an input device such as a touch screen {touch sensor} and a keyboard, and receives various operations including a remote operation {that receives a manual operation by an operator}.”; Saito, ¶¶ [0026]); obtaining a voice input and converting the inputted voice into voice data (“A sound detection section 22 is a sound collector such as a microphone, and includes a function of detecting a sound. For example, when a user (e.g. a customer) who uses the image forming device 10 utters a voice, the voice is detected by the sound detection section 22.”; Saito, ¶¶ [0027]); recognizing voice from the voice data and outputting a recognition result (“the voice is detected by the sound detection section 22” where detecting the voice, as used herein, is speech/voice recognition {see the example described in paragraphs [0048]-[0049] “the user (e.g. an operator at the customer center) of the terminal device 12 utters a voice for transferring the operation authority to the image forming device 10 (S01). For example, the operator utters a voice such as “Go ahead”, “Press”, and “Please”.” where “When the sound detection section 32... detects... “Go ahead” described above) (S02), the remote operation section 36 causes the UI section 30 of the terminal device 12 to display information that indicates a transfer of the operation authority (S03)” thus speech recognition}; Saito, ¶¶ [0027], [0048]-[0049]). However, Saito fail(s) to expressly recite executing 
The relevance of Manabe is described above with relation to claim 1. Regarding claim 10, Manabe teaches executing processing associated in advance with the recognition result (“the user operates (eg, presses) the utterance button 4 in S10. As a result, the voice input operation start instruction information is transmitted from the voice operation activation unit 4 to the operation state determination unit 5. The operation state determination unit 5 stores the transmitted voice input operation start instruction information in the memory 50 and transmits it to the voice input start / end determination unit 3.” As voice input performs a specific function (operates an utterance button) based on the receipt of the voice input, the function {processing} is associated in advance with the voice input {recognition result}, and the voice input is derived from “The voice recognition execution unit 2…[which] executes a process for recognizing a voice uttered by a user {the recognition result},” and the predetermined speech button is operated {executed}; Manabe, ¶¶ [0028], [0016]); and determining whether the manual operation is being received, and disabling the inputted voice upon determining that the manual operation is being received (“When receiving the manual operation start instruction information” from a “user perform[ing] a desired manual operation on the manual operation unit 6,” “the operation state determination unit 5 stores the same information in the memory 50 {determines whether manual operation is being received} and transmits it to the voice input start / end determination unit 3,” where  “the voice input start / end determination unit 3 receives the manual operation start instruction information from the operation state determination unit 5, and interrupts the voice input operation function in the voice recognition execution unit 2 {disables the inputted voice upon determining that the manual operation is being received}”; Manabe, ¶¶ [0030]-[0031]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito to Manabe to include executing processing associated in advance with the recognition result; and determining whether the manual operation is being received and disabling the inputted voice upon determining that the manual operation is being received. By automatically starting and stopping the audio operation based on manual operation “High convenience can be realized… [without requiring] user operation,” as recognized by Manabe. (Manabe, ¶ [0008]).

Regarding claim 11, Saito discloses A non-transitory recording medium (“The present invention relates to an information processing apparatus and a non-transitory computer readable medium.” Saito, ¶ [0002]) storing a computer readable program for causing a computer to execute (The method performed by the “image forming system that serves as an information processing system” performed through “one or more processors … read[ing] and execut[ing] a program stored in a storage device”; Saito, ¶¶ [0017], [0078]): receiving a manual operation by an operator (“A user interface (UI) section 20 includes a display section and an operation section. The display section is a display device such as a liquid crystal display. The operation section is an input device such as a touch screen {touch sensor} and a keyboard, and receives various operations including a remote operation {that receives a manual operation by an operator}.”; Saito, ¶¶ [0026]); obtaining a voice input and converting the inputted voice into voice data (“A sound detection section 22 is a sound collector such as a microphone, and includes a function of detecting a sound. For example, when a user (e.g. a customer) who uses the image forming device 10 utters a voice, the voice is detected by the sound detection section 22.”; Saito, ¶¶ [0027]); recognizing voice from the voice data and outputting a recognition result (“the voice is detected by the sound detection section 22” where detecting the voice, as used herein, is speech/voice recognition {see the example described in paragraphs [0048]-[0049] “the user (e.g. an operator at the customer center) of the terminal device 12 utters a voice for transferring the operation authority to the image forming device 10 (S01). For example, the operator utters a voice Saito, ¶¶ [0027], [0048]-[0049]). However, Saito fail(s) to expressly recite executing processing associated in advance with the recognition result; and determining whether the manual operation is being received and disabling the inputted voice upon determining that the manual operation is being received.
The relevance of Manabe is described above with relation to claim 1. Regarding claim 11, Manabe teaches executing processing associated in advance with the recognition result (“the user operates (eg, presses) the utterance button 4 in S10. As a result, the voice input operation start instruction information is transmitted from the voice operation activation unit 4 to the operation state determination unit 5. The operation state determination unit 5 stores the transmitted voice input operation start instruction information in the memory 50 and transmits it to the voice input start / end determination unit 3.” As voice input performs a specific function (operates an utterance button) based on the receipt of the voice input, the function {processing} is associated in advance with the voice input {recognition result}, and the voice input is derived from “The voice recognition execution unit 2…[which] executes a process for recognizing a voice uttered by a user {the recognition result},” and the predetermined speech button is operated {executed}; Manabe, ¶¶ [0028], [0016]); and determining whether the manual operation is being received, and disabling the inputted voice upon determining that the manual operation is being received (“When receiving the manual operation start instruction information” from a “user perform[ing] a desired manual operation on the manual operation unit 6,” “the operation state determination unit 5 stores the same information in the memory 50 {determines whether manual operation is being received} and transmits it to the voice input start / end determination unit 3,” where  “the voice input start / end determination unit 3 receives the manual operation start instruction information from the operation state determination unit 5, and interrupts Manabe, ¶¶ [0030]-[0031]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito to incorporate the teachings of Manabe to include executing processing associated in advance with the recognition result; and determining whether the manual operation is being received and disabling the inputted voice upon determining that the manual operation is being received. By automatically starting and stopping the audio operation based on manual operation “High convenience can be realized… [without requiring] user operation,” as recognized by Manabe. (Manabe, ¶ [0008]).

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saito and Manabe as applied to claim 2 above, and further in view of Vasilieff (U.S. Pat. App. Pub. No. 2016/0124706, hereinafter Vasilieff).

Regarding claim 4, the rejection of claim 2 is incorporated. Saito and Manabe disclose all of the elements of the current invention as stated above. However, Saito and Manabe fail to expressly recite wherein when an amount of the operation received by the touch sensor per predetermined time for a content displayed on a screen being displayed on the display exceeds an operation amount threshold, the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment.
Vasilieff teaches systems and method for “touch gestures to initiate multi-modal speech recognition.” (Vasilieff, ¶ [0002]). Regarding claim 4, Vasilieff teaches wherein when an amount of the operation received by the touch sensor per predetermined time for a content displayed on a screen being displayed on the display exceeds an operation amount threshold (“At time 802 (0.0 s), the user touches the screen {touch sensor},” where the screen Vasilieff, ¶¶ [0039]), the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment (“At time 810 (1.2 s), the touch gesture ends, and the system stops audio capture as well.” where the portion of time after the 1.2 seconds is the voice input inhibition segment. Thus the system will ignore voice input into the voice input device during this segment.; Vasilieff, ¶¶ [0039]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito as modified by the automated voice operation control device of Manabe to incorporate the teachings of Vasilieff to include wherein when an amount of the operation received by the touch sensor per predetermined time for a content displayed on a screen being displayed on the display exceeds an operation amount threshold, the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment. The automated system for multi-modal commands described here reduces the number of steps for a user “to issue a multi-modal command,” thus reducing possible “confusion and errors., as recognized by Vasilieff. (Vasilieff, ¶ [0006]).

Claim 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Saito and Manabe as applied to claim 2 above, and further in view of Taki (U.S. Pat. App. Pub. No. 2018/0081622, hereinafter Taki).

Regarding claim 6, the rejection of claim 2 is incorporated. Saito and Manabe disclose all of the elements of the current invention as stated above. However, Saito and Manabe fail to 
Taki teaches an information processing system with triggers for switching between input modes. (Taki, ¶ [0005]). Regarding claim 6, Taki teaches wherein even when determining that the manual operation is being received (“in a case where a predetermined first trigger is detected when the character-unit input mode {manual operation} is executed {being received} as the information input mode” where when the “character-unit input mode is executed … the output control portion 147 causes a character-unit input screen G10-1 to be displayed” which includes “a keyboard on the character-unit input screen G10-1.”; Taki, ¶¶ [0045], [0058], FIG. 1), the hardware processor enables the inputted voice if the voice data is of a first exception command that instructs a start of an operation input by voice (“in a case where a predetermined first trigger is detected when the character-unit input mode {manual operation} is executed as the information input mode, the mode control portion 144 switches the information input mode from the character-unit input mode to the phrase-unit input mode {enables the inputted voice}” where “the first trigger is... in one example, a predetermined speech input start [command],” and where “the speech input start operation {that instructs a start of an operation input by voice} may be an operation of executing a predetermined speech input start command (e.g., an utterance, “speech”) {the voice data is of a first exception command}.”; Taki, ¶¶ [0045]-[0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the image forming system of Saito as modified by the automated voice operation control device of Manabe to incorporate the teachings of Taki to include wherein even when determining that the manual operation is being received, the hardware processor enables the inputted voice if the voice data is of a first exception command that instructs a start of an operation input by voice. “According to the present disclosure as Taki. (Taki, ¶ [0007]).

Allowable Subject Matter
Claim 5 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
Regarding claim 5, the word “hierarchy” is understood based on the explanation given in paragraph [0078] of the instant specification, where “hierarchy of the active screen” is explained as “corresponding to the number of screen transitions.”  As such, the prior art made of record fails to expressly disclose “wherein when a hierarchy {number of screen transitions} of a screen being displayed on the display exceeds a predetermined hierarchy number threshold, the hardware processor determines that a segment in which the voice is inputted to the voice input device is the voice input inhibition segment.” 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627.  The examiner can normally be reached on 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/
Patent Examiner, Art Unit 2657  
                                                                                                                                                                                                      

/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
08/09/2021