Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
			Response to Applicant’s Arguments
In response to “Thus, claim 1 refers specifically to capturing "voice data from a teleconference that the computing device is logged into or a speech being given by a user of the computing device." In tis context, the claimed device will "output a prompt to the user of the computing device, while the user is speaking, the prompt ...comprising content relevant to the teleconference or speech to guide the user in deciding what to say next." (Claim 1) (emphasis added). In contrast to claim 1, Akkiraju is directed to a textual conversation rather than a spoken conversation, and does not "output a prompt to the user of the computing device, while the user is speaking." (Claim 1) (emphasis added)” and “There is no suggestion or teaching here of the claimed device to "output a prompt to the user of the computing device, while the user is speaking." (Claim 1) (emphasis added). To the contrary, this paragraph suggests that any prompt would come before a user is speaking when the turn of that user to add to the conversation is about to begin. For at least this reason, Akkiraju cannot meet the high standard for anticipating claim 1”.
With respect to the former, Akkiraju teaches “method 400 may initiate with operation 402, where textual data associated with a real-time conversation between a first participant and a second participant is received. In one embodiment, the textual data may be created from one or more spoken utterances. For example, a speech-to-text application may create the textual data from a spoken conversation between the first participant and the second participant” (¶58).
With respect to the latter, Akkiraju further teaches “Furthermore, in one embodiment, the current point in the real-time conversation may include a current turn for the first participant in the conversation. For example, the real-time conversation may include a turn-taking conversation where the first participant and the second participant speak one at a time in alternating turns” (¶64). 
Therefore, given “the current point” being a turn where at least the first participant is speaking, the disclosure “method 400 may proceed with operation 406, where a dialog act to be entered by the first participant at a current point in the real-time conversation that meets the objective is determined, utilizing a model” (¶63) and “method 400 may proceed with operation 408, where the dialog act is returned to the first participant…returning the dialog act may include displaying the dialog act to the first participant” (¶69) teach at least determining and returning a dialog act to the first participant at the current turn where the first participant is speaking. 
As a result, Akkiraju teaches “output a prompt to the user of the computing device, while the user is speaking”.
In response to “This describes using "earlier conversations" to train a machine learning model to process a real-time conversation. The training data may be organized by time, topic, tone or emotion. This is the training data, not the current real-time conversation. Thus, there is no teaching or suggestion here of identifying "words in the voice data [that] comprise keywords regarding current events [in a current conversation], and the prompt comprises information regarding the [same] current events" in the current conversation. (Claim 5) (emphasis added). For at least this additional reason, the rejection of claim 5 should be reconsidered and withdrawn”. 
Akkiraju teaches “Also, in one embodiment, determining the impact may include applying the proposed dialog act to the model, where the model is trained using earlier conversations. For example, earlier conversations may include one or more of the first participant and the second participant. In another example, the model may be trained by identifying dialog acts, features, and objectives of the earlier conversations. In yet another example, the features may include one or more of n-grams (e.g., sequences of one or more words) within the textual data, temporal or topic based information, tone/emotion, etc. In still another example, the model may identify dialog acts, features, and the objective of the real-time conversation” (¶83). 
Since the “features” of the real-time conversation includes n-grams within the textual data converted from the speech to text process and topic based information, and the proposed dialog act / prompt was determined based on said features of the real-time conversation, Akkiraju teaches “the prompt comprises information regarding the current event” because the proposed dialog act is to be entered by at least the first participant at a current point in the real-time conversation that would meet an objective indicating a desired outcome of the real time conversation for the first participant (¶¶62-63).
In response to “Thus, Raniere clearly does not teach or suggest claim 6, viewed as a whole. Raniere does not teach or suggest to "capture voice data for a specific user from a teleconference that the computing device is logged into or a speech being given by the specific user" and, with the voice data from that specific teleconference or speech "tally a number of times a particular word is used by the specific user in the voice data; and based on the tally, alert the specific user when the number of times reaches a limit and prompt the user with an alternative word choice." (Claim 6) (emphasis added)”.
Raniere teaches “In yet other embodiments, user input may additionally include audio input 311” (¶29) and “In some embodiments, the computing system 101 may be capable of detecting the voice signature of the user. By determining the user's voice signature or differentiating multiple user voices, the computing system 101 may be capable of separately analyzing each user's activities and provide more custom tailored content for that specific user, by forgoing audio attributed to non-users. In some embodiments multiple computing devices may be networked together or share analysis with other networked devices. For example a mobile phone and a desktop computer may communicate user input with each other. The mobile phone may record telephone conversations and mobile network internet viewing information and share it with the desktop computer which may result in unified custom user experience and specific content across the spectrum of the user's devices” (¶30).
In one particular example, “the received user input may be a conversation between the user and another individual, wherein the text suggests that the surrounding keywords are important or urgent such as "I really need to study for the upcoming biology exam." In response to this, the computing system may identify biology notes stored on the computing system and research helpful information for studying purposes” (¶55).
Therefore, the speech from which the keywords were parsed from includes conversation between a specific user and another individual. When the computing system calculate the frequency the keyword (i.e., tally the number of times the keyword is used by the specific user in the conversation) and alert the user to the overuse of the word and supply a list of suggested alternatives, Raniere teaches “capture voice data for a specific user from a teleconference that the computing device is logged into or a speech being given by the specific user” and “tally a number of times a particular word is used by the specific user in the voice data; and based on the tally, alert the specific user when the number of times reaches a limit and prompt the user with an alternative word choice”.  
In response to “It must be noted that claim 11 does not refer to "speech" generally, but to "a speech." (Emphasis added). This article "a" indicates that the "speech" is an event in the form of a presentation or public speaking event. In contrast, as noted above, Raniere is silent about capturing voice data from a particular speech given by a specific user”. 
No where in claim 11 is there a requirement that “a speech” must be an event in the form of a presentation or a public speaking event. All Claim 11 required is “capture the voice data from a speech being given by the specific user with a microphone of the computing device”. Raniere made it clear that I/O interface 109 is a microphone to receive inputs generated by human beings. This is enough to teach “capture the voice data from a speech being given by the specific user with a microphone of the computing device”.
In response to “As highlighted above, in the context of a teleconference, the method compiles "spoken" information about a first participant and then displays that information to a second participant. This is not taught or suggested by Redfern”. 
Redfern teaches “In some embodiments, the identifying (operation 112) is based on context information associated with a conversation between the individual and a second individual. For example, key word analysis of the conversation may allow a topic or an entity (such as an organization or company) to be identified, which may be used to limit possible candidates during speech or voice identification of the individual. Similarly, lexicography of the conversation may allow a native language of the individual to be identified, which may also be used to limit possible candidates during speech or voice identification of the individual. In some embodiments, the context information includes a location of the individual. For example, based on the location of an electronic device being used by the individual (which may be determined by a cellular-telephone network or a Global Positioning System), if the name `John Smith` is stated during a conversation, this name may be used in conjunction with the location when accessing the content in the social graph (see below) to determine which John Smith is being discussed. Alternatively or additionally, the pre-defined voice print may include information specifying pronunciation of the individual's name, and the identifying may be based on the pronunciation” (¶18), “Next, the computer system provides the content (operation 118). For example, the content may include business information associated with the individual, such as: contact information, education information, a job title, an organization or company associated with the individual, skills of or attributes associated with the individual, and/or connections of the individual to other individuals” (¶21), and “Providing the content (operation 118) may be based on the emotional state and/or the situational state determined in operation 116…Thus, the emotional state and the situational state may be used to generate the metric, such as a graphical symbol (e.g., a storm cloud for a `bad` day)” (¶22) where “The computer system may optionally determine an emotional state of the individual (such as angry or frustrated) and/or a situational state of the individual based on the signal (operation 116). For example, the emotional state may be determined based on intonations, temporal spacing between words, voice stress in the vocal signal” (¶20).
Therefore, Redfern compiles "spoken" information about a first participant and then displays that information to a second participant because content about “John Smith” is provided to other individuals (i.e., second participant) with compiled spoken information such as a graphical symbol “storm cloud” indicating that “John Smith” is having a bad day based on John Smith’s intonations, temporal spacing between words, voice stress in the vocal signal.
In response to “In contrast, the Action cites to Redfern as describing to "access individual's calendar to determine how busy the individual is." (Action, p. 7). Clearly, this not compiling spoken information from the voice data of a specific teleconference "wherein the compiled information comprises availability of the first participant with regards to scheduling information”.
When the graphical symbol “storm cloud” compiled from John Smith’s speech indicating that “John Smith” is having a bad day is imposed next to John Smith’s calendar / scheduling information and provided to other individuals, such indication of John Smith having a bad day would at least indicate the availability or non-availability of “John Smith” to schedule a meeting with other individuals.
In response to “Thus, if the references are combined, the result would be, in the textual embodiment of Akkiraju, the use of the autofill feature described by Caskey. The combination does not teach or suggest "wherein the captured voice data comprises a sentence currently being spoken by the user, and the prompt comprises a word or phrase to complete the sentence."”.
Caskey teaches given a current word / phrase entered by a user via a speech recognition device akin to speech to text processing of Akkiraju (¶58, speech to text application), predict the next n-1 words to allow a user to auto-complete sentences (Col 1, Rows 47-60 and Col 3, Rows 27-30).
Akkiraju requires a determination of whether a proposed dialog act will change a probability of the objective of the first participant being achieved during the current point in the real-time conversation (¶70, ¶¶80-81, and ¶129) and sending a notification allowing the proposed dialog act received from the first participant to be entered in the current point of the conversation (¶84).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement the notification allowing the proposed dialog act received from the first participant to be entered in the current point of the conversation with word or phrase to complete the sentence in order to minimize the number of words entered (Caskey, Col 1, Rows 59-60).
In response to “Additionally, claim 9 recites: "The method of claim 12, wherein the compiled information includes a link or preview to a file mentioned by the first participant." (Emphasis added). Also, as discussed above with respect to independent claim 12, from which claim 9 depends, the "compiled information" is spoken information” and “Here, the computing system is performing a search to identify links. This is not the claimed "compiled information [that] includes a link or preview to a file mentioned by the first participant." (Emphasis added). For at least this additional reason, the rejection of claim 9 should be reconsidered and withdrawn”.
As previously noted, “the received user input may be a conversation between the user and another individual” (Raniere, ¶55). Therefore, the teaching “a computing system may receive user input which may be parsed for keywords relating to space travel and exploration. In response, the computing system may provide encyclopedia entries and websites associated with space exploration to the user” and “Using the space exploration example above, instead of the computing system bringing webpages or encyclopedic entries themselves to the user's attention, the active search software may present the user with a prompt suggesting the user check out the following links and within each link may be the websites or encyclopedic entries among other destinations” (Raniere, ¶62) teach compile from spoken information at least a link to a website or encyclopedic entries consistent with what was mentioned by a participant. 
/RICHARD Z ZHU/            Primary Examiner, Art Unit 2675