Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Acknowledgement of Priority
Acknowledgement is made of applicant’s claim for foreign priority based on Korean application 10-2019-0094583 filed on 08/02/2019. Certified copy of the foreign priority document has been received. 
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 11-12, and 18-20 are rejected under 35 USC 103(a) as being unpatentable over Cunico et al. (US 2016/0239489 A1) and Altaf et al. (US 2018/0301141 A1).

Regarding Claims 1 and 20, Cunico discloses an intelligent presentation-assisting device (¶59 and Fig. 7, data processing system 700 configured to manage electronic meeting questions comprising system 100 components 110-126 shown in Fig. 1) comprising: 
¶38, one or more microphone communicatively linked to the system 100); and 
a controller (¶59, processor) configured to: 
learn presentation content of a presentation presented by a presenter (¶30, as presenter performs live presentation, live presentation can be captured as multimedia content and communicated to system 100 in real time; ¶31, context indexer 112 creates an index of the received multimedia content by performing speech recognition, natural language processing, and semantic analysis to identify one or more respective concepts to a plurality of segments in the multimedia content), 
recognize a voice uttered by the presenter during presentation of the presentation content (¶30, as presenter performs live presentation, live presentation are captured as multimedia content including audio / video taken of the presenter), 
analyze, by an artificial intelligence processor, an intent of the presenter based on the presentation content and the voice (¶31, apply speech recognition, NLP, and semantic analysis on audio / video taken of the presenter as well as images and text presented by the presenter; ¶32, NLP is a field of artificial intelligence and semantic analysis is used with NLP to derive computer understandable meaning from natural language input), and 
executing an operation corresponding to the voice of the presenter based on the intent (¶33, recognize a concept using the speech recognition, NLP, and semantic analysis and determine when and where in the multimedia content discussed the specific concept).
Cunico does not disclose that the voice uttered by the presenter is a command voice.
Altaf discloses an intelligent presentation assisting device (¶12, user agent 105 such as digital virtual assistant receives spoken request from a user and a natural language classification engine NLC 120 responds to the request) learning presentation content presented by a presenter (¶12-13, NC120 responds to the request using a NLC trainer 130 and a machine learning process 170 where NLC trainer 130 tags one or more ground truths (¶14, a ground truth has an utterance and an intent) of training data with respective context tags; ¶15, the utterance of ground truth is a speech / text presented by the users), recognizing a command voice uttered by the presented during presentation and analyze an intent of the presenter based on the presentation content and the command voice (¶16, user uttered “I want apple” while browsing electronics department, in between searches for mobile phone and smart watch, immediately after lookup of locations for Apple store, tag the utterance with tag “Electronics” and classify the utterance into intent “seeking information on Apple brand electronic devices”), and execute an operation corresponding to the command voice of the presenter based on the intent (¶16, respond according to the context tag “Electronics” and the intent “seeking information on Apple brand electronic devices”). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico to recognize command voice uttered by presenter during presentation of the presentation content to analyze and respond to an intent of the presenter based on the presentation content and the command voice (compare Cunico, ¶33, using NLP and semantic analysis on presenter audio, video, image and text to determine multimedia content discussing a concept and meaning with Altaf, ¶16, using Natural language classification to determine presenter utterance “I want apple” according to the circumstances of user interaction to determine concept / context tag “Electronics” and classify an intent) in order to derive meaning from natural language input according to the circumstances of user interaction / presentation (Cunico, ¶32, NLP and semantic analysis to derive computer understandable meaning from natural language input, Altaf, ¶16, determine context tag / concept and intent of user presented speech / text according to the circumstances of user interaction).
Regarding Claim 2, Cunico discloses wherein the operation includes executing a function or responding with audio output or visual output (¶30, multimedia content includes audio, video, images, and text; in view of ¶29, user can access multimedia content to listen to and view the multimedia content; ¶37, select one or more segments of multimedia content to answer a question presented by a user). 
Regarding Claim 3, Cunico as modified by Altaf disclose recognizing text displayed on a screen as the presentation content (Cunico, ¶31, apply speech recognition, NLP, and semantic analysis on images and text presented by the presenter); and 
executing the operation corresponding to the command voice of the presenter based on the text and the intent (Cunico, ¶33 and ¶37, semantic analysis of images and text presented by the presenter are answers to user questions and select said segments for presentation to one or more users; Altaf, ¶16, using the circumstance of user interaction to perform natural language classification and tag utterance with an intent; e.g., tag presenter’s audio with an intent that a corresponding segment of images and text the presenter interacted with is the answer to user question). 
Regarding Claim 4, Cunico as modified by Altaf disclose converting the presentation content uttered by the presenter into text (Cunico, ¶31, apply speech recognition, NLP, and semantic analysis on images and text presented by the presenter); and 
executing the operation corresponding to the command voice based on the text and the intent (Cunico, ¶33 and ¶37, semantic analysis of images and text presented by the presenter are answers to user questions and select said segments for presentation to one or more users; Altaf, ¶16, using the circumstance of user interaction to perform natural language classification and tag utterance with an intent; e.g., tag presenter’s audio with an intent that a corresponding segment of images and text the presenter interacted with is the answer to user question). 
Regarding Claim 11, Cunico as modified by Altaf disclose applying the command voice of the presenter to a pre-trained command voice recognition, determination and classification model to generate an application result (Cunico ¶37, select one or more segments of multimedia content containing answer to a question presented by a user to the one or more user; i.e., perform NLP and semantic analysis on presenter audio command responding to participant / audience questions during presentation to determine segments of multimedia content contextually related to the recognized presenter per Cunico ¶31 using a natural language classification model taught by Altaf ¶14); 
determining whether a situation in which the command voice of the presenter is recognized correctly based on the application result to generate a determination result (Altaf ¶17, classify the utterance intent according to the circumstances of user interaction); and 
Altaf ¶16, classify user intent based on ground truth contextualizer). 
Regarding Claim 12, Cunico as modified by Altaf disclose wherein the command voice recognition, determination and classification model is stored in an external artificial intelligence (AI) device (Altaf ¶49-50, cloud computing node 10 comprising computer system 12; ¶56, NLC engine 120 stored in memory 28 of computer system 12; compare Cunico, ¶67, implementation as a remote computer or server connected to user’s computer), and wherein the command voice recognition, determination and classification model is configured to: 
receive, from the intelligent presentation-assisting device, feature values related to information related to a situation in which the command voice of the presenter is recognized (Cunico, ¶30, multimedia content are communicated to the system 100 in real time as presenter performs live presentation; ¶31, receiving audio, video of the presenter and videos images text presented by the presenter as well as audio and video of participants / audiences; compare Altaf ¶16, the circumstances of user interaction); and 
transmit, from the external AI device, a result of applying the command voice of the presenter to the command voice recognition, determination and classification model, to the intelligent presentation-assisting device (Cunico ¶37, select one or more segments of multimedia content containing answer to a question presented by a user to the one or more user; i.e., perform NLP and semantic analysis on presenter audio command responding to participant / audience questions during presentation to determine segments of multimedia content contextually related to the recognized presenter per Cunico ¶31 using a natural language classification model taught by Altaf ¶14). 
Regarding Claim 18, Cunico discloses identifying a plurality of presentation page candidates based on the command voice and the presentation content presented by the presenter and providing information about the plurality of presentation page candidates to the presenter (¶55, answer user question by presenting selected segment to user comprising providing an indication of where the selected segments are located in the multimedia content; e.g., indicate an identifier for a slide or document page presented during the presentation; in view of ¶50, present the answer / selected segment to presenter for validation). 
Regarding Claim 19, Cunico discloses a server device for providing an intelligent presentation-assisting service (¶59 and ¶61, data processing system 700 implemented as a computer; ¶67, implementation as a remote computer or server connected to user’s computer), the server device comprising: 
a communication unit configured to communicate with an intelligent presentation-assisting device (¶61, network adapters 745 coupled to the data processing system 700 to enable coupling to client multimedia presentation devices through private or public networks; ¶67, server / remote computer may be connected to user’s computer through local area network or wide area network); and 
a controller configured to (¶59, data processing system 700 implementation with a processor): 
receive, from the intelligent presentation-assisting device, presentation content of a presentation presented by a presenter (¶30, as presenter performs live presentation, live presentation can be captured as multimedia content (includes audio, video, images, and text) and communicated to system 100 in real time by computers that captured the content presented on a display),
receive, from the intelligent presentation assisting device, voice data of the presenter while the presenter is presenting the presentation (¶30, the multimedia content includes audio taken of the presenter), 
recognize a voice uttered by the presenter during presentation of the presentation content (¶30, as presenter performs live presentation, live presentation are captured as multimedia content including audio / video taken of the presenter), 
analyze an intent of the presenter based on the presentation content and the voice to generate an analysis result (¶31, apply speech recognition, NLP, and semantic analysis on audio / video taken of the presenter as well as images and text presented by the presenter; ¶32, NLP is a field of artificial intelligence and semantic analysis is used with NLP to derive computer understandable meaning from natural language input), and 
transmit the analysis result to the intelligent presentation assisting device for executing an operation corresponding to the voice of the presenter based on the intent (¶33, recognize a concept using the speech recognition, NLP, and semantic analysis and determine when and where in the multimedia content discussed the specific concept; ¶37, selecting one or more segments of multimedia content for presentation to one or more users).
Cunico does not disclose that the voice uttered by the presenter is a command voice.
Altaf discloses an intelligent presentation assisting device (¶12, user agent 105 such as digital virtual assistant receives spoken request from a user and a natural language classification engine NLC 120 responds to the request) learning presentation content ¶12-13, NC120 responds to the request using a NLC trainer 130 and a machine learning process 170 where NLC trainer 130 tags one or more ground truths (¶14, a ground truth has an utterance and an intent) of training data with respective context tags; ¶15, the utterance of ground truth is a speech / text presented by the users), recognizing a command voice uttered by the presented during presentation and analyze an intent of the presenter based on the presentation content and the command voice (¶16, user uttered “I want apple” while browsing electronics department, in between searches for mobile phone and smart watch, immediately after lookup of locations for Apple store, tag the utterance with tag “Electronics” and classify the utterance into intent “seeking information on Apple brand electronic devices”), and execute an operation corresponding to the command voice of the presenter based on the intent (¶16, respond according to the context tag “Electronics” and the intent “seeking information on Apple brand electronic devices”). 
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico to recognize command voice uttered by presenter during presentation of the presentation content to analyze and respond to an intent of the presenter based on the presentation content and the command voice (compare Cunico, ¶33, using NLP and semantic analysis on presenter audio, video, image and text to determine multimedia content discussing a concept and meaning with Altaf, ¶16, using Natural language classification to determine presenter utterance “I want apple” according to the circumstances of user interaction to determine concept / context tag “Electronics” and classify an intent) in order to derive meaning from natural language input according to the circumstances of user interaction / presentation (Cunico, ¶32, NLP and semantic analysis to derive computer understandable meaning from natural language input, Altaf, ¶16, determine context tag / concept and intent of user presented speech / text according to the circumstances of user interaction) when transmitting the analysis result to the intelligent presentation assisting device for executing an operation corresponding to the voice of the presenter based on the intent (Cunico, ¶37, understanding that a selected segment of multimedia content was intended to answer to a question presented by a user).
Claims 5-10 are rejected under 35 USC 103(a) as being unpatentable over Cunico et al. (US 2016/0239489 A1) and Altaf et al. (US 2018/0301141 A1) as applied to claim 1, in further view of Kashtan et al. (US 2016/0275952 A1).
Regarding Claims 5-8, Cunico discloses wherein the learning the presentation content includes: learning the presentation content by acquiring first voice data uttered by the presenter (¶31, apply speech recognition, NLP, and semantic analysis on audio / video taken of the presenter; ¶33, speech recognition, NLP, and semantic analysis of audio and video to determine that a specific concept was discussed).
Cunico does not disclose extracting first feature values of a first voice from the first voice data. 
Kashtan discloses a system comprising computer system for supporting multi-party communications session with a plurality of client devices associated with a plurality of participants / presenters (¶27-28, client devices 101-105 and computer system 110; ¶29, client device 101 shared by Presenters A-C taking turns speaking and presenting shared content), where the computer system acquires first voice data uttered by the presenter (¶36, computer system 110 receives audio data representing speech of a current speaker), extracts first feature values of a first voice from the first voice data (¶37, computer system 110 comprises a speaker recognition component 111 that processes the audio data to generate an audio fingerprint of the current speaker by extracting one or more speech features that may be used to characterize the voice of the current speaker; ¶55-58, stored audio fingerprints were audio fingerprints previously generated for a current speaker and stored in speaker fingerprint repository / directory / data storage), compares the first feature values of the first voice with the command voice of the presenter to generate a first comparison result (¶38, speaker recognition component 111 compares the audio fingerprint of the current speaker against stored audio fingerprints for individuals who have been previously recognized by computer system 110); and 
when the first comparison result is included in a set voice range, recognizing the command voice uttered by the presenter (¶38, speaker recognition component 111 compares the audio fingerprint of the current speaker against stored audio fingerprints for individuals who have been previously recognized by computer system 110 based on a stored audio fingerprint having a minimum distance close enough to make a positive identification); and 
when the first comparison result is not included in a set voice range, not recognizing the command voice uttered by the presenter (¶55, where the current speaker is not successfully recognized, present an indication / message that a speaker is unrecognized);
determining the presenter as a speaker (¶52, a particular recognized speaker is a lead presenter) and learning utterance content uttered from the speaker when the command voice of the presenter is not recognized, wherein the learning the utterance content uttered by the speaker includes learning the utterance content by acquiring second voice data uttered by ¶55, when speaker is unrecognized, computer system 110 may request one or more participants to supply tagging information such as a name or other suitable identity of the unrecognized speaker; per ¶62, computer system 110 may request and receive local audio fingerprints and local metadata from client devices that include speech and speaker recognition functionality when Presenter join the online meeting; i.e., client device acquires second voice data uttered by speaker and extracting second features from the second voice data to generate local audio fingerprint and shares the local fingerprint with computer system 110).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico to perform speaker recognized as taught by Kashtan in order to determine the presenter as an administrator or an authorized user (Cunico, ¶29, ¶35; compare Kashtan, ¶38, to recognize or attempt to recognize the current speaker).
Regarding Claim 9, Kashtan modified the combination of Cunico and Altaf discloses recognizing a second command voice of the speaker in the utterance content (Kashtan, ¶58, receive subsequent audio data representing speech of the current speaker during the current multi party communications session; e.g., ¶52, documents / contents contextually related to a particular recognized speaker may be obtained and made available to participants; compare Cunico ¶37, select one or more segments of multimedia content containing answer to a question presented by a user to the one or more user; i.e., perform NLP and semantic analysis on presenter audio command responding to participant / audience questions during presentation to determine segments of multimedia content contextually related to the recognized presenter per Cunico ¶31); 
analyzing a second intent of the speaker corresponding to the second command voice (Cunico ¶31, present NLP and semantic analysis on presenter audio; implement Altaf ¶16 to perform natural language classification on user utterance with an intent according to the circumstances of user interaction; i.e., presenter’s answer to audience user’s questions by selecting segments of multimedia content for presentation to the audience user); and 
executing a second operation corresponding to the second command voice of the speaker (Cunico ¶37, select segments of multimedia content containing an answer to a question presented by the user for presentation to the user). 
Regarding Claim 10, Kashtan modified the combination of Cunico and Altaf discloses wherein the recognizing the second command voice of the speaker includes: 
comparing the second feature values of the second voice with the second command voice of the speaker to generate a second comparison result and when the second comparison result is included in a set voice range, recognizing the second voice command of the speaker (Kashtan ¶58, computer system 110 receives subsequent audio data representing speech of current speaker, generate a new audio fingerprint of the current speaker based on the subsequent audio data, and compare the new audio fingerprint of the current speaker against the stored audio fingerprint of the current speaker to perform the function of speaker recognition at ¶38 to determine the stored audio fingerprint within minimum distance close enough to make a positive identification).
Claims 13-14 are rejected under 35 USC 103(a) as being unpatentable over Cunico et al. (US 2016/0239489 A1) and Altaf et al. (US 2018/0301141 A1) as applied to claim 11, in further view of Tsunomori (US 2021/0103619 A1).
Regarding Claims 13-14, Cunico-Altaf discloses wherein the command voice recognition, determination and classification model is stored in a network (Cunico, ¶27, content indexer 132, which performs NLP and semantic analysis per ¶31, is stored to data structures accessible over a network; compare Altaf, ¶51 and ¶56, NLC 120 stored on a cloud computer over a cloud network), wherein the command voice recognition, determination and classification model is configured to: 
receive, via the network, information related to a situation in which the command voice of the presenter is recognized (Cunico, ¶30, multimedia content are communicated to the system 100 in real time as presenter performs live presentation; ¶31, receiving audio, video of the presenter and videos images text presented by the presenter as well as audio and video of participants / audiences; compare Altaf ¶16, the circumstances of user interaction); and 
transmit, via the network, a result of applying the command voice of the presenter to the command voice recognition, determination and classification model (Cunico ¶37, select one or more segments of multimedia content containing answer to a question presented by a user to the one or more user; i.e., perform NLP and semantic analysis on presenter audio command responding to participant / audience questions during presentation to determine segments of multimedia content contextually related to the recognized presenter per Cunico ¶31 using a natural language classification model taught by Altaf ¶14).
Cunico-Altaf does not disclose where the network is a 5G network. 
Tsunomori teaches dialog device providing a command voice recognition, determination, and classification over a 5G network (Figs. 1-2, dialog device 100 connected to user terminal 50 over a network; ¶26-28, dialog device 100 provide services including analyzing user speech input information to extract words and features / characteristics to determine meaning and central topic; ¶114, network is a 5G network).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico-Altaf to communicate multimedia content and voice command recognition service over a 5G network in order to provide a next generation system (Tsunomori, ¶114).
Such 5G network implementation would implement receiving, via the 5G network, downlink control information (DCI) used to schedule transmission of the information related to the situation in which the command voice of the presenter is recognized (Tsunomori, ¶113, notification of information may be performed using physical layer signaling such as downlink control information (“DCI”); i.e., Cunico ¶31, communication presenter audio, video, and video / image / text being presented as well as participant / audience audio / video), wherein the information related to the situation in which the command voice of the presenter is recognized is transmitted via the 5G network based on the DCI (Tsunomori, ¶113, notification of situation information may be performed using DCI). 
Claim 16 is rejected under 35 USC 103(a) as being unpatentable over Cunico et al. (US 2016/0239489 A1) in view of Altaf et al. (US 2018/0301141 A1) and Tsunomori (US 2021/0103619 A1) as applied to claim 14, in further view of Kashtan et al. (US 2016/0275952 A1).
Regarding Claim 16, Cunico as modified by Tsunomori discloses controlling a communication unit to transmit the information related to the situation in which the command voice of the presenter is recognized to an AI processor (Cunico, ¶30, multimedia content are communicated to the system 100 in real time as presenter performs live presentation; ¶31, system 100 comprises content index 132 to receive audio, video of the presenter and videos images text presented by the presenter as well as audio and video of participants / audiences to perform speech recognition, NLP, and semantic analysis; ¶32, NLP is an artificial intelligence computer process) included in the 5G network (Tsunomori, ¶114, 5G network).
The combination does not disclose controlling the communication unit to receive AI processed information from the AI processor, wherein the AI processed information is information indicating whether the command voice of the presenter is recognized.
Kashtan discloses a system comprising computer system for supporting multi-party communications session with a plurality of client devices associated with a plurality of participants / presenters (¶27-28, client devices 101-105 and computer system 110; ¶29, client device 101 shared by Presenters A-C taking turns speaking and presenting shared content) controlling a communication unit to transmit the information related to the situation in which the command voice of the presenter is recognized to a processor (¶36-37, computer system 110 receives audio data that represents speech of a current speaker) and controlling the communication unit to receive processed information indicating whether the command voice of the presenter is recognized from the processor (¶37-38, computer system 110 includes speaker recognition component 111 to process audio data and perform speaker recognition; ¶53, computer system 110 includes alert component 120 that transmit data to the client device to generate audible and visual alert whenever the particular recognized speaker talks).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico-Altaf to controlling the communication unit to receive AI processed information indicating whether the command voice of the presenter is recognized from the AI processor in order to respond to a participant / audience request to identify a particular recognized speaker (Kashtan, ¶53).
Claim 17 are rejected under 35 USC 103(a) as being unpatentable over Cunico et al. (US 2016/0239489 A1) and Altaf et al. (US 2018/0301141 A1) as applied to claim 1, in further view of Stalnacke et al. (US 2009/0181659 A1).
Cunico does not disclose changing a page currently being presented to a next presentation page based on the command voice and the presentation content presented by the presenter.
Stalnacke discloses changing a page currently being presented to a next presentation page based on the command voice and a presentation content presented by the presenter (¶92).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement Cunico-Altaf to change a page currently being presented to a next presentation page based on the command voice and a presentation content presented by the presenter so as to log the activity of participant presenting the presentation content (Stalnacke, ¶92).
Allowable Subject Matter
Claim 15 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
US 8571851 B1 discloses a system for performing semantic interpretation using gaze order by obtaining data identifying a sequence of gaze attention dwell positions, obtaining a semantic description of elements displayed on a visual display, obtaining a transcription of an utterance, correlating the gaze attention dwell positions with the semantic description of elements to generate a sequence of one or more of the elements, and performing semantic interpretation of at least one term included in the transcription based at least on the sequence of the elements.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or 
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        11/12/2021