DETAILED ACTION
This action is responsive to the following communication: the claims filed on 03/10/2020.  This action is made Non-Final.
Claims 1-17 pending in the case.  Claims 1, 13-17 are independent claims. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “first acquiring unit”, “a second acquiring unit”, “a third acquiring unit”, “an execution unit” in claims 1, 13.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-13 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The claims recite, in part, “first acquiring unit”, “a second acquiring unit”, “a third acquiring unit”, “an execution unit”, in claim 1, 13 which do not have any corresponding description of the structures within the specification, therefore not meeting the written description requirement.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim limitations “first acquiring unit”, “a second acquiring unit”, “a third acquiring unit”, “an execution unit” in claims 1 and 13 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The specification is devoid of adequate structure to perform the claimed function. In particular, the specification states the claimed function of each term used. There is no disclosure of any particular structure, either explicitly or inherently, to perform the corresponding functions recites in the invoked units. The use of the terms “first acquiring unit”, “a second acquiring unit”, “a third acquiring unit”, “an execution unit” are not adequate structure for performing said function because it does not describe a particular structure for performing the function. As would be recognized by those of ordinary skill in the art, the terms “first acquiring unit”, “a second acquiring unit”, “a third acquiring unit”, “an execution unit” refer to only the function to perform and can be performed in any number of ways in hardware, software or a combination of the two. The specification does not provide sufficient details such that one of ordinary skill in the art would understand which structure or structures perform(s) the claimed function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  The dependent claims incorporate the deficiencies of the claim upon which it depends; therefore, are also rejected.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kam et al. (US 2017/0031652 A1; hereinafter as Kam) in view of Hu (US 2014/0006028 A1; hereinafter as Hu).

As to claims 1, 14, 16, Kam discloses:
An information processing apparatus (see Fig. 1 and ¶¶ 0034-0038, 0118), a method (see ¶ 0016, 0115), a non-transitory computer-readable storage medium (see ¶ 0117) comprising: 
a first acquiring unit configured to acquire a command input to application software (see Fig. 1 and ¶ 0036-0037; the command receiver 100 receives an input of a voice command regarding navigation of the screen.  ¶ 0052; the user may input a primary command “Play” and then input an additional command “car advertisement” after a certain time interval, or the user may input both commands together, i.e., “Play car advertisement”.  ¶ 0060; when the user inputs the voice command “search for cars” while a web page is being displayed on the screen, the preprocessor 310 may extract keywords “search for” and “cars” from the voice command; understand the meaning of extracted keywords; determine that the user wanted to input the keyword “Cars” in a search window of the web page; and perform an operation corresponds to clicking the search button. At this time, the preprocessor 310 may check the search window content in the web page using an analysis result, e.g., a semantic map, which was obtained from the screen analyzer 200); 
a second acquiring unit configured to acquire scene information representing a scene represented by a screen displayed when executing the application software (see Fig. 1 and ¶ 0038; The screen analyzer 200 may analyze content displayed on a screen of a display device, or the display 120, and generate a content analysis result. The content may include any entity displayed on the screen, such as various applications, messages, emails, documents, songs, videos, images, and other entities (e.g., text input windows, click buttons, dropdown menus, etc.). The content analysis result may include, as described below, either or both of a sematic map and a screen index, where the semantic map represents the meaning of content and the screen index guides the user to designate a location on the screen. However, the content analysis result is not limited thereto.  See Fig. 2 and ¶ 0040-0046, 0050; the screen analyzer includes an index display and/or a semantic map generator); 
a third acquiring unit configured to acquire a command [file] based on the command and the scene information (see ¶ 0056; the command composer 300 may interpret the voice command based on the content analysis result from the screen analyzer 200 and then convert said voice command into a navigation command.  See Figs. 3A-3B and ¶ 0058-0065; the command composer includes a preprocessor 310 and a command converter 320; ¶ 0068; the command composer 300 may compose a navigation command from a voice command, using a command set database (DB) 350. The command set DB 350 may include a memory configured to be a part of the command composer 300 or separate.  ¶0069; The command set DB 350 may store command sets in the memory, where each command set is generated by mapping a common executable command (e.g., mouse click) that is carried out by the screen navigation apparatus 1 to a predefined keyword (e.g., search). As shown in FIG. 3C, the command set DB 350 may include a common command set DB 351 and/or a user command set DB 352.  ¶0070; common command sets may refer to command sets in which executable commands that commonly carried out in an operating system of the screen navigation apparatus 1 or on a basic platform for providing screen content are mapped with major keywords related to a voice command that is commonly input by users.  ¶0075; the command composer 300 may translate an input command, which may be a primary command and/or an additional command. Then the command composer 300 may refer to the command set DB 350 to extract a command that corresponds to the input command, and may create a navigation command using the extracted command. Here, if the referenced command is defined in the personalized user command set DB 352, the user may be able to create the navigation command more promptly. Before the user has finished inputting his or her voice command, i.e., while the user's voice command is still being input, the command composer 300 may translate the speech that is currently being inputted, and create a plurality of navigation commands to be executed in stages. Thus, the command composer 300 translates or extracts the user's speech in real time and may even predict a number of possible user commands based on the real time translation of the user's speech. As an example, in the case where the user regularly watches a weather program at a certain time of day, when the user begins to input a voice command, such as “run”, “search”, or “play”, the command composer 300 may extract the input in real time and predict the voice command to be “run a weather application,” “search for today's weather,” or “play the found program”, respectively. The command composer 300 may also base the prediction on the time of day the voice command is input. By predicting the voice command of the user, the command composer 300 may reduce the time for executing the voice command as compared to not predicting the command.  ¶ 0078; the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command); and 
an execution unit configured to execute processing in accordance with the command [file] (see Fig. 1 and ¶ 0076; the command executer 400 executes the command created by the command composer 300 of FIG. 1 to perform a corresponding navigation operation on the screen. For example, in response to the navigation command created by the command composer 300, the command executer 400 may highlight a specific keyword on the screen or navigate the screen to search for a new keyword. The command executer 400 may also carry out web browsing or a move to a previous or next page of the current page. In addition, the command executer 400 may zoom in on a particular area of the screen, open a link, or navigate files to play voice/image/video files. Further, the user may display the content of a particular email or message or search for emails and/or messages received on a specific date. In this case, if the command composer 300 has generated multiple navigation commands to be executed in stages according to the user's command, the command executer 400 may execute said commands in multiple stages and may sequentially display each execution result on the screen. In addition, as noted above and as only examples, the command composer 300 of FIG. 1 may be configured according to any or any combination of configurations of the command composers 300 of FIGS. 3A-3C, noting that embodiments are not limited thereto).  
Kam does not to teach command file.
Hu is relied upon for teaching the deficiencies.  Specifically, Hu an apparatus, a method, and a readable medium (see Fig. 1, ¶¶ 0017-0021) configured to acquire a command file based on voice command (see Fig. 3 and ¶ 0023; when the mobile device receives an audio command from the user, it may then compare the received audio command with the stored audio samples to determine whether the audio command matches any of the stored audio samples.  See Fig. 8 and ¶ 0104; FIG. 8 depicts an example of an application commands file for an application, in accordance with some implementations. In some implementations, an application of mobile device includes executable actions that are associated with command texts. For example, an email application of a mobile device is configured to respond to a command to compose a new message by opening up an interface for composing a new message. Similarly, every application includes a list of executable actions and the corresponding commands that trigger the executable actions. FIG. 8 contains an example of such a list for the Chatter.RTM. application. The voice command text column 810 lists the command texts that the application responds to, and the action column 820 lists the executable actions corresponding to each command text).
Both references each discloses an apparatus for processing audio commands. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the feature of processing audio commands disclosed in Kam to include feature of acquiring command file as disclosed by Hu to identify command corresponding to audio input as claimed.  One skilled in the art could have been motivated to make such a combination is to determine the executable commands easily using the command file (Hu: see ¶ 0104).

As to claims 13, 15, 17, Kam discloses:
An information processing apparatus (see Fig. 1 and ¶¶ 0034-0038, 0118), a method (see ¶ 0016, 0115), a non-transitory computer-readable storage medium (see ¶ 0117) comprising: 
a first acquiring unit configured to acquire scene information representing a scene represented by a screen displayed when a condition is satisfied (see Fig. 1 and ¶ 0038; The screen analyzer 200 may analyze content displayed on a screen of a display device, or the display 120, and generate a content analysis result. The content may include any entity displayed on the screen, such as various applications, messages, emails, documents, songs, videos, images, and other entities (e.g., text input windows, click buttons, dropdown menus, etc.). The content analysis result may include, as described below, either or both of a sematic map and a screen index, where the semantic map represents the meaning of content and the screen index guides the user to designate a location on the screen. However, the content analysis result is not limited thereto.  See Fig. 2 and ¶ 0040-0046, 0050; the screen analyzer includes an index display and/or a semantic map generator.  ¶ 0061; if a condition such as voice command is detected); 
a second acquiring unit configured to acquire a command [file] based on the scene information (see ¶ 0056; the command composer 300 may interpret the voice command based on the content analysis result from the screen analyzer 200 and then convert said voice command into a navigation command.  See Figs. 3A-3B and ¶ 0058-0065; the command composer includes a preprocessor 310 and a command converter 320; ¶ 0068; the command composer 300 may compose a navigation command from a voice command, using a command set database (DB) 350. The command set DB 350 may include a memory configured to be a part of the command composer 300 or separate.  ¶0069; The command set DB 350 may store command sets in the memory, where each command set is generated by mapping a common executable command (e.g., mouse click) that is carried out by the screen navigation apparatus 1 to a predefined keyword (e.g., search). As shown in FIG. 3C, the command set DB 350 may include a common command set DB 351 and/or a user command set DB 352.  ¶0070; common command sets may refer to command sets in which executable commands that commonly carried out in an operating system of the screen navigation apparatus 1 or on a basic platform for providing screen content are mapped with major keywords related to a voice command that is commonly input by users.  ¶0075; the command composer 300 may translate an input command, which may be a primary command and/or an additional command. Then the command composer 300 may refer to the command set DB 350 to extract a command that corresponds to the input command, and may create a navigation command using the extracted command. Here, if the referenced command is defined in the personalized user command set DB 352, the user may be able to create the navigation command more promptly. Before the user has finished inputting his or her voice command, i.e., while the user's voice command is still being input, the command composer 300 may translate the speech that is currently being inputted, and create a plurality of navigation commands to be executed in stages. Thus, the command composer 300 translates or extracts the user's speech in real time and may even predict a number of possible user commands based on the real time translation of the user's speech. As an example, in the case where the user regularly watches a weather program at a certain time of day, when the user begins to input a voice command, such as “run”, “search”, or “play”, the command composer 300 may extract the input in real time and predict the voice command to be “run a weather application,” “search for today's weather,” or “play the found program”, respectively. The command composer 300 may also base the prediction on the time of day the voice command is input. By predicting the voice command of the user, the command composer 300 may reduce the time for executing the voice command as compared to not predicting the command.  ¶ 0078; the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command); and 
an execution unit configured to execute processing in accordance with the command [file] (see Fig. 1 and ¶ 0076; the command executer 400 executes the command created by the command composer 300 of FIG. 1 to perform a corresponding navigation operation on the screen. For example, in response to the navigation command created by the command composer 300, the command executer 400 may highlight a specific keyword on the screen or navigate the screen to search for a new keyword. The command executer 400 may also carry out web browsing or a move to a previous or next page of the current page. In addition, the command executer 400 may zoom in on a particular area of the screen, open a link, or navigate files to play voice/image/video files. Further, the user may display the content of a particular email or message or search for emails and/or messages received on a specific date. In this case, if the command composer 300 has generated multiple navigation commands to be executed in stages according to the user's command, the command executer 400 may execute said commands in multiple stages and may sequentially display each execution result on the screen. In addition, as noted above and as only examples, the command composer 300 of FIG. 1 may be configured according to any or any combination of configurations of the command composers 300 of FIGS. 3A-3C, noting that embodiments are not limited thereto).  
Kam does not to teach command file.
Hu is relied upon for teaching the deficiencies.  Specifically, Hu an apparatus, a method, and a readable medium (see Fig. 1, ¶¶ 0017-0021) configured to acquire a command file based on voice command (see Fig. 3 and ¶ 0023; when the mobile device receives an audio command from the user, it may then compare the received audio command with the stored audio samples to determine whether the audio command matches any of the stored audio samples.  See Fig. 8 and ¶ 0104; FIG. 8 depicts an example of an application commands file for an application, in accordance with some implementations. In some implementations, an application of mobile device includes executable actions that are associated with command texts. For example, an email application of a mobile device is configured to respond to a command to compose a new message by opening up an interface for composing a new message. Similarly, every application includes a list of executable actions and the corresponding commands that trigger the executable actions. FIG. 8 contains an example of such a list for the Chatter.RTM. application. The voice command text column 810 lists the command texts that the application responds to, and the action column 820 lists the executable actions corresponding to each command text).
Both references each discloses an apparatus for processing audio commands. It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the feature of processing audio commands disclosed in Kam to include feature of acquiring command file as disclosed by Hu to identify command corresponding to audio input as claimed.  One skilled in the art could have been motivated to make such a combination is to determine the executable commands easily using the command file (Hu: see ¶ 0104).

As to claim 2, the rejection of claim 1 is incorporated. Kam and Hu further teach: wherein the second acquiring unit acquires, as the scene information, a type of an object displayed on the screen and a layout of the object (Kam: see ¶ 0010; The processor may further include a screen analyzer configured to analyze the content displayed on the screen and generate the content analysis result. The screen analyzer may be configured to analyze the content using one or more of the following techniques: source analysis, text analysis, speech recognition, image analysis and context information analysis. The content analysis result may include a semantic map or a screen index, or both, wherein the semantic map represents a determined meaning of the content displayed on the screen, and the screen index indicates a determined position of the content displayed on the screen. The screen index may include at least one of the following items: coordinates, grids, and identification symbols, and the screen analyzer determines at least one of a type, size, and position of the screen index to be displayed on the screen by taking into account at least one of the following factors: coordinates of the screen index, a screen resolution, and positions and distribution of key contents on the screen, and displays the screen index on the screen based on the determination. In response to a user selecting one of screen indices displayed on the screen by a user's speech, eye-gaze, or gesture, or any combination thereof, the command composer may be configured to interpret the voice command based on screen position information that corresponds to the selected screen index).  

As to claim 3, the rejection of claim 1 is incorporated. Kam and Hu further teach: wherein the second acquiring unit acquires, as the scene information, an image most similar to a screen displayed when executing the application software, out of a plurality of images collected in advance as the screen of the application software (Kam: see ¶ 0010, 0047-0048; the screen analyzer may be configured to analyze the content using one or more of the following techniques: source analysis, text analysis, speech recognition, image analysis and context information analysis. The content analysis result may include a semantic map or a screen index, or both, wherein the semantic map represents a determined meaning of the content displayed on the screen, and the screen index indicates a determined position of the content displayed on the screen. The screen index may include at least one of the following items: coordinates, grids, and identification symbols, and the screen analyzer determines at least one of a type, size, and position of the screen index to be displayed on the screen by taking into account at least one of the following factors: coordinates of the screen index, a screen resolution, and positions and distribution of key contents on the screen, and displays the screen index on the screen based on the determination. In response to a user selecting one of screen indices displayed on the screen by a user's speech, eye-gaze, or gesture, or any combination thereof, the command composer may be configured to interpret the voice command based on screen position information that corresponds to the selected screen index).  

As to claim 4, the rejection of claim 1 is incorporated. Kam and Hu further teach:
wherein the third acquiring unit acquires a command file corresponding to the command if the command file is found (Kam: see ¶ 0056; the command composer 300 may interpret the voice command based on the content analysis result from the screen analyzer 200 and then convert said voice command into a navigation command.  See Figs. 3A-3B and ¶ 0058-0065; the command composer includes a preprocessor 310 and a command converter 320; ¶ 0068; the command composer 300 may compose a navigation command from a voice command, using a command set database (DB) 350. The command set DB 350 may include a memory configured to be a part of the command composer 300 or separate.  ¶0069; The command set DB 350 may store command sets in the memory, where each command set is generated by mapping a common executable command (e.g., mouse click) that is carried out by the screen navigation apparatus 1 to a predefined keyword (e.g., search). As shown in FIG. 3C, the command set DB 350 may include a common command set DB 351 and/or a user command set DB 352.  ¶0070; common command sets may refer to command sets in which executable commands that commonly carried out in an operating system of the screen navigation apparatus 1 or on a basic platform for providing screen content are mapped with major keywords related to a voice command that is commonly input by users.  ¶0075; the command composer 300 may translate an input command, which may be a primary command and/or an additional command. Then the command composer 300 may refer to the command set DB 350 to extract a command that corresponds to the input command, and may create a navigation command using the extracted command. Here, if the referenced command is defined in the personalized user command set DB 352, the user may be able to create the navigation command more promptly. Before the user has finished inputting his or her voice command, i.e., while the user's voice command is still being input, the command composer 300 may translate the speech that is currently being inputted, and create a plurality of navigation commands to be executed in stages. Thus, the command composer 300 translates or extracts the user's speech in real time and may even predict a number of possible user commands based on the real time translation of the user's speech. As an example, in the case where the user regularly watches a weather program at a certain time of day, when the user begins to input a voice command, such as “run”, “search”, or “play”, the command composer 300 may extract the input in real time and predict the voice command to be “run a weather application,” “search for today's weather,” or “play the found program”, respectively. The command composer 300 may also base the prediction on the time of day the voice command is input. By predicting the voice command of the user, the command composer 300 may reduce the time for executing the voice command as compared to not predicting the command.  ¶ 0078; the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command.  Hu: see Fig. 3 and ¶ 0023; when the mobile device receives an audio command from the user, it may then compare the received audio command with the stored audio samples to determine whether the audio command matches any of the stored audio samples.  See Fig. 8 and ¶ 0104; FIG. 8 depicts an example of an application commands file for an application, in accordance with some implementations. In some implementations, an application of mobile device includes executable actions that are associated with command texts. For example, an email application of a mobile device is configured to respond to a command to compose a new message by opening up an interface for composing a new message. Similarly, every application includes a list of executable actions and the corresponding commands that trigger the executable actions. FIG. 8 contains an example of such a list for the Chatter.RTM. application. The voice command text column 810 lists the command texts that the application responds to, and the action column 820 lists the executable actions corresponding to each command text).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the feature of processing audio commands disclosed in Kam to include feature of acquiring command file as disclosed by Hu to identify command corresponding to audio input as claimed.  One skilled in the art could have been motivated to make such a combination is to determine the executable commands easily using the command file (Hu: see ¶ 0104).

As to claim 5, the rejection of claim 1 is incorporated. Kam and Hu further teach: 
wherein the third acquiring unit acquires a command file corresponding to a set of the command and the scene information if a command file corresponding to the command cannot be found (Hu: see Fig. 3 and ¶ 0091; the computing device performing method 300 determines that the first audio sample does not match any of one or more audio samples stored in a local dictation database of the computing device. The local dictation database of the computing device may store audio samples that correspond to executable actions that may be executed in the operating system or an application of the computing device. In some implementations, the computing device may make this determination using a speech recognition algorithm or application running on the computing device that compares two audio samples to detect a match. In some implementations, when the first audio sample does not match any of the audio samples stored in the local dictation database, it may be that the first audio sample does not contain a recognizable command. Alternatively, the first audio sample may contain a command that the local speech recognition application does not recognize but can be recognized by a remote speech recognition application running on a remote server. In FIG. 3, at block 330, the computing device performing method 300 transmits the first audio sample to a remote server for detection of one or more words indicated by the first audio sample. In some implementations, the remote server may possess more processing power and more sophisticated speech recognition algorithms than those possessed by the computing device. The first audio sample may be transmitted to the remote server, where a speech recognition application on the remote server may recognize and identify the one or more words that are being spoken in the first audio sample. In the example of the Chatter.RTM. mobile application user, the audio sample that was recorded by the mobile device may be transmitted to a salesforce.com, inc.-hosted server, where the audio sample is processed to identify the words that the user spoke).  
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the feature of processing audio commands disclosed in Kam to include feature of acquiring command file as disclosed by Hu to identify command corresponding to audio input as claimed.  One skilled in the art could have been motivated to make such a combination is to determine the executable commands easily using the command file (Hu: see ¶ 0104).

As to claim 6, the rejection of claim 1 is incorporated. Kam and Hu further teach wherein the command file comprises a file which defines a processing sequence (Kam: see ¶ 0068-0072; a user command set may be a set of particular commands, or a set of commands personalized for each user with respect to a sequence of consecutive commands, wherein the personalization may be performed based on keywords, phrases, and gestures, or any combination thereof. For example, different users may use different keywords, such as “search” or “click” as a particular command, instead of what would usually be the “click” command, for carrying out the operation of clicking a search button in a web page).  

As to claim 7, the rejection of claim 1 is incorporated. Kam and Hu further teach, wherein the first acquiring unit acquires, as the command, a result of voice recognition for a voice input to the application software (Kam: see Fig. 1 and ¶ 0036-0037; the command receiver 100 receives an input of a voice command regarding navigation of the screen).  

As to claim 8, the rejection of claim 7 is incorporated. Kam and Hu further teach 
 wherein the execution unit sets a processing parameter included in the command file based on the result of the voice recognition (Kam: see ¶ 0056; the command composer 300 may interpret the voice command based on the content analysis result from the screen analyzer 200 and then convert said voice command into a navigation command.  See Figs. 3A-3B and ¶ 0058-0065; the command composer includes a preprocessor 310 and a command converter 320; ¶ 0068; the command composer 300 may compose a navigation command from a voice command, using a command set database (DB) 350. The command set DB 350 may include a memory configured to be a part of the command composer 300 or separate.  ¶0069; The command set DB 350 may store command sets in the memory, where each command set is generated by mapping a common executable command (e.g., mouse click) that is carried out by the screen navigation apparatus 1 to a predefined keyword (e.g., search). As shown in FIG. 3C, the command set DB 350 may include a common command set DB 351 and/or a user command set DB 352.  ¶0070; common command sets may refer to command sets in which executable commands that commonly carried out in an operating system of the screen navigation apparatus 1 or on a basic platform for providing screen content are mapped with major keywords related to a voice command that is commonly input by users.  ¶0075; the command composer 300 may translate an input command, which may be a primary command and/or an additional command. Then the command composer 300 may refer to the command set DB 350 to extract a command that corresponds to the input command, and may create a navigation command using the extracted command. Here, if the referenced command is defined in the personalized user command set DB 352, the user may be able to create the navigation command more promptly. Before the user has finished inputting his or her voice command, i.e., while the user's voice command is still being input, the command composer 300 may translate the speech that is currently being inputted, and create a plurality of navigation commands to be executed in stages. Thus, the command composer 300 translates or extracts the user's speech in real time and may even predict a number of possible user commands based on the real time translation of the user's speech. As an example, in the case where the user regularly watches a weather program at a certain time of day, when the user begins to input a voice command, such as “run”, “search”, or “play”, the command composer 300 may extract the input in real time and predict the voice command to be “run a weather application,” “search for today's weather,” or “play the found program”, respectively. The command composer 300 may also base the prediction on the time of day the voice command is input. By predicting the voice command of the user, the command composer 300 may reduce the time for executing the voice command as compared to not predicting the command.  ¶ 0078; the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command.  Hu: see Fig. 3 and ¶ 0023; when the mobile device receives an audio command from the user, it may then compare the received audio command with the stored audio samples to determine whether the audio command matches any of the stored audio samples.  See Fig. 8 and ¶ 0104; FIG. 8 depicts an example of an application commands file for an application, in accordance with some implementations. In some implementations, an application of mobile device includes executable actions that are associated with command texts. For example, an email application of a mobile device is configured to respond to a command to compose a new message by opening up an interface for composing a new message. Similarly, every application includes a list of executable actions and the corresponding commands that trigger the executable actions. FIG. 8 contains an example of such a list for the Chatter.RTM. application. The voice command text column 810 lists the command texts that the application responds to, and the action column 820 lists the executable actions corresponding to each command text).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the feature of processing audio commands disclosed in Kam to include feature of acquiring command file as disclosed by Hu to identify command corresponding to audio input as claimed.  One skilled in the art could have been motivated to make such a combination is to determine the executable commands easily using the command file (Hu: see ¶ 0104).

As to claim 9, the rejection of claim 8 is incorporated. Kam and Hu further teach 
 wherein the execution unit performs display for prompting a user to input a voice corresponding to the process parameter if the processing parameter included in the command file cannot be set based on the result of voice recognition (Kam: see ¶ 0056; the command composer 300 may interpret the voice command based on the content analysis result from the screen analyzer 200 and then convert said voice command into a navigation command.  See Figs. 3A-3B and ¶ 0058-0065; the command composer includes a preprocessor 310 and a command converter 320; ¶ 0068; the command composer 300 may compose a navigation command from a voice command, using a command set database (DB) 350. The command set DB 350 may include a memory configured to be a part of the command composer 300 or separate.  ¶0069; The command set DB 350 may store command sets in the memory, where each command set is generated by mapping a common executable command (e.g., mouse click) that is carried out by the screen navigation apparatus 1 to a predefined keyword (e.g., search). As shown in FIG. 3C, the command set DB 350 may include a common command set DB 351 and/or a user command set DB 352.  ¶0070; common command sets may refer to command sets in which executable commands that commonly carried out in an operating system of the screen navigation apparatus 1 or on a basic platform for providing screen content are mapped with major keywords related to a voice command that is commonly input by users.  ¶0075; the command composer 300 may translate an input command, which may be a primary command and/or an additional command. Then the command composer 300 may refer to the command set DB 350 to extract a command that corresponds to the input command, and may create a navigation command using the extracted command. Here, if the referenced command is defined in the personalized user command set DB 352, the user may be able to create the navigation command more promptly. Before the user has finished inputting his or her voice command, i.e., while the user's voice command is still being input, the command composer 300 may translate the speech that is currently being inputted, and create a plurality of navigation commands to be executed in stages. Thus, the command composer 300 translates or extracts the user's speech in real time and may even predict a number of possible user commands based on the real time translation of the user's speech. As an example, in the case where the user regularly watches a weather program at a certain time of day, when the user begins to input a voice command, such as “run”, “search”, or “play”, the command composer 300 may extract the input in real time and predict the voice command to be “run a weather application,” “search for today's weather,” or “play the found program”, respectively. The command composer 300 may also base the prediction on the time of day the voice command is input. By predicting the voice command of the user, the command composer 300 may reduce the time for executing the voice command as compared to not predicting the command.  ¶ 0078; the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command).  

As to claim 10, the rejection of claim 8 is incorporated. Kam and Hu further teach 
 wherein the execution unit prompts the user to input a voice corresponding to the process parameter by a voice if the processing parameter included in the command file cannot be set based on the result of the voice recognition (Kam: see Figs. 4a and ¶ 0078; the index display 210 may display identification symbols such as a grid lines 41, grid coordinates 42, grid points 43, or area 44, e.g., rectangles 44, or any combination thereof, on the screen. As described above, the index display 210 may determine types and colors of indices. The index display 210 may also determine the types, thicknesses, and sizes of lines to be displayed on the screen by taking into account various factors, such as the screen size, the resolution of the screen, and analysis results of contents. The identification symbols provide an index for user voice commands. Therefore, a user can designate desired content on the screen by including the index in a voice command. For example, a user may input “enlarge coordinate one one” and the content within the area indexed to (1,1) may be enlarged. The user may also input “enlarge grid one one” or “enlarge point one one” and the content within the area indexed to the grid or grid point, respectively, may be enlarged. Additionally, and using area 44 as only an example, an interaction or operation (or implementation of the same) with respect to an area 44, or content represented by area 44, may be contextually determined, such as through the context of one of or any combination of two or more of a gaze, gesture, content, or command. For example, the command receiver 100 may receive a user's command and a gaze and/or gesture, and the command composer 300 may interpret the user's gaze or gesture, or consider the users gaze and/or gesture with respect to analyses of the corresponding content, to identify an area selected by the user to define the context for the command. As another example, the command receiver 100 may receive the user's command through a detected gesture, such as through a signaling or sign language, and use the user's gaze to provide context for the user's command. Alternatively, the command receiver 100 may receive the user's command through a detected gaze, e.g., where different gazes are predefined to correspond to particular commands, and use a detected gesture of the user, e.g., to identify one or more such grid identifiers, to provide context for the user's command).  

As to claim 11, the rejection of claim 7 is incorporated. Kam and Hu further teach 
wherein the first acquiring unit displays the result of the voice recognition (see Fig. 4A and ¶ 0010, 0078; The processor may further include a screen analyzer configured to analyze the content displayed on the screen and generate the content analysis result. The screen analyzer may be configured to analyze the content using one or more of the following techniques: source analysis, text analysis, speech recognition, image analysis and context information analysis. The content analysis result may include a semantic map or a screen index, or both, wherein the semantic map represents a determined meaning of the content displayed on the screen, and the screen index indicates a determined position of the content displayed on the screen. The screen index may include at least one of the following items: coordinates, grids, and identification symbols, and the screen analyzer determines at least one of a type, size, and position of the screen index to be displayed on the screen by taking into account at least one of the following factors: coordinates of the screen index, a screen resolution, and positions and distribution of key contents on the screen, and displays the screen index on the screen based on the determination. In response to a user selecting one of screen indices displayed on the screen by a user's speech, eye-gaze, or gesture, or any combination thereof, the command composer may be configured to interpret the voice command based on screen position information that corresponds to the selected screen index).  
As to claim 12, the rejection of claim 1 is incorporated. Kam and Hu further teach 
 wherein the first acquiring unit acquires, as the command, an input result by one of a key input, a gesture input, an input based on a sensing result by a sensor (Kam: see Fig. 1 and ¶ 0036-0037; the command receiver 100 receives an input of a voice command regarding navigation of the screen.  ¶ 0054, 0063; gesture).  

Conclusion

The prior art made of record on form PTO-892 and not relied upon is considered pertinent to applicant's disclosure.  Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.  For example: 
Choudhury et al. (US 2017/0236513 A1) – a system and a method of performing voice based actions by an electronic device includes receiving a voice command from a user; determining a relationship between the voice command and a context a historic voice command of the user, and performing an action by executing the voice command based on the context of the historic voice command.

It is noted that any citation to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  In re Heck, 699 F.2d 1331, 1332-33,216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006,1009, 158 USPQ 275,277 (CCPA 1968)).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUYETLIEN T TRAN whose telephone number is (571)270-1033.  The examiner can normally be reached on M-F: 8:00 AM - 8:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Renee Chavez can be reached on 571-270-1104.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/TUYETLIEN T TRAN/Primary Examiner, Art Unit 2179