DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on   01/12/2022. 
Claims 1-15 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 7, and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. More specifically, the claim limitations reciting “extract the first portion of the text from the first text file; generate a second text file to include the first portion of the text separate from the first text file; ...2Application No. 16/500,373 Reply to Office Action Dated August 6, 2021extract the second portion of the text from the first text file based on the fourth command, modify the second text file to include the second portion of the text; and store the second text file to be accessed by the user.” Please see new mappings for further detail. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Duncan et al. (US Patent No. 7107533), hereinafter Duncan, in view of Shih et al. (U.S. PG Pub No. 2010/0153114), hereinafter Shih, and further in view of Kahn et al. (U.S. PG Pub No. 2007/0244702), hereinafter Kahn.

Regarding claim 1, Duncan teaches
A computing device (an electronic book device, i.e. computing device (3:4)), comprising:
a speaker to output audio signals (the book includes a second output device, such as an audio speaker, i.e. a speaker, that presents content in audio form, i.e. output audio signals (3:18-24,34-36));
a microphone to receive audio signals (an input device, i.e. receive, that includes a microphone, i.e. microphone, as part of an audio user interface, i.e. audio signals (3:18-26));
a memory that stores instructions and a first text file (a data storage device, i.e. memory, that may contain computer-executable instructions, i.e. instructions (3:39-; and
a processor that executes the instructions to (a microprocessor, i.e. processor, executes the computer-executable instructions (3:27-4:1)):
receive a first command from a user to read text from the first text file (a voice command, i.e. first command, from a user, such as “begin speak”, can invoke recitation of an electronic book which has text content, i.e. read the text from the first text file (4:23-28),(7:16-19), where the audio user interface receives the user command, i.e. receive (5:44-48));
determine a start position for reading the text (the abstract interface stores information such as the current position in the content being rendered, or the start position of the text being displayed, i.e. determine a start position, where displaying the text may occur by way of audio output via a speaker, i.e. reading the text (5:55-67));
output, via the speaker ..., an audio reading of the text to the user beginning at the start position (recitation may be invoked, i.e. output an audio, where the system begins reading the current page from the beginning, i.e. reading of the text to the user beginning at the start position (5:55-67),(7:4-19), where the audio speaker presents content, i.e. output, via the speaker (3:18-24,34-36));
receive a second command from the user to provide a comment (a user clicks on an annotation button, i.e. receive a second command, to put the system into text-note mode for entering an annotation, i.e. provide a comment (8:3-8));
record, via the microphone, the comment provided by the user at a current reading position in the text (a text-note mode provides the user with a text box in ;
receive a third command from the user to format the text in the first text file, wherein the third command is a voice command received via the microphone (presentation buttons can be invoked by a click, i.e. receive a third command from the user, that can control the size of the fonts used to display the text, i.e. format the text in the first text file (7:16-21), and where the user commands able to be input in graphics-related ways can also be input using the voice recognition module that has a vocabulary for recognizing the command, i.e. command is a voice command (Fig 3),(6:45-51), and where the voice recognition module includes a microphone, i.e. received via the microphone (3:21-24));
modify at least one format characteristic of at least a first portion of the text in the first text file based on the third command received from the user (presentation buttons control the size of the font, i.e. modify at least one format characteristic, used to display the text, i.e. at least a first portion of the text, where the presentation buttons can be invoked by a click or a voice command, i.e. based on the third command received from the user (Fig 3),(6:45-51),(7:16-21));
receive a fourth command from the user to capture a second portion of the text in the first text file (a user can click or drag over text, i.e. receive a fourth command from the user, to select a block of text, i.e. capture a second portion of the text in the first text file (7:60-8:1), and where the user commands able to be input in  (Fig 3),(6:45-51)).
While Duncan provides the synthesis of speech, Duncan does not specifically teach that the resulting speech is stored as an audio file, and thus does not teach
convert the first text file to an audio file;
output, ... using the audio file, an audio reading of the text...;
Shih, however, teaches convert the first text file to an audio file (email can be extracted and converted into an audio file for listening [0023:1-3]);
output, ... using the audio file, an audio reading of the text... (the audio file, i.e. using the audio file, of the converted email is sent to the client for play, i.e. output...an audio reading of the text [0023:1-3],[0024]).
Duncan and Shih are analogous art because they are from a similar field of endeavor in enabling voice control of a system that audibly reads text for a user. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the speech synthesis teachings of Duncan with the specific conversion of text into an audio file as taught by Shih. The motivation to do so would have been to achieve a predictable result of enabling the user to control play of an audio document, including reading specific portions of the document (Shih [0004]).
While Duncan in view of Shih provides the creation of annotations and comments related to a text file, Duncan in view of Shih does not specifically teach the creation of a separate text file that includes text from the original text file, and thus does not teach
extract the first portion of the text from the first text file;
generate a second text file to include the first portion of the text separate from the first text file;
2Application No. 16/500,373Reply to Office Action Dated August 6, 2021extract the second portion of the text from the first text file based on the fourth command; 
modify the second text file to include the second portion of the text; and
store the second text file to be accessed by the user
Kahn, however, teaches extract the first portion of the text from the first text file (during review of a transcribed session file, i.e. first text file, an operator may select text, i.e. first portion of the text, which may be further extracted during post processing, i.e. extract [0074]);
generate a second text file to include the first portion of the text separate from the first text file (post processing may include data extraction from the document itself, i.e. first text file, where the extracted data may be reassembled into a document or report, i.e. generate a second text file to include the first portion of the text [0092]);
2Application No. 16/500,373extract the second portion of the text from the first text file based on the fourth command (during review of a transcribed session file, i.e. first text file, an operator may select text, i.e. second portion of the text...based on the fourth command, which may be further extracted during post processing, i.e. extract [0074]); 
modify the second text file to include the second portion of the text (post processing may include data extraction from the document itself, where the extracted ; and
store the second text file to be accessed by the user (the extracted data may be reassembled into a document or report [0092], where postprocessed files, i.e. second text file, may be saved, i.e. store, for distribution to human end users, i.e. accessed by the user [0050]).  
Duncan, Shih, and Kahn are analogous art because they are from a similar field of endeavor in enabling a user to create transcriptions, annotations, and text-to-speech audio. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the creation of annotations and comments related to a text file teachings of Duncan, as modified by Shih, with the extraction of text from the session document into another report as taught by Kahn. The motivation to do so would have been to achieve a predictable result of enabling the saving of data into a report in a database for later distribution to other users (Kahn [0050]).

Regarding claim 2, Duncan in view of Shih and Khan teaches claim 1, and Duncan further teaches
the first command is a voice command received via the microphone from the user to initiate the audio reading of the text (a voice command, i.e. first command is a voice command, from a user, such as “begin speak”, can invoke recitation of an electronic book which has text content, i.e. initiate the audio reading of the text (4:23-28),(7:16-19), where the audio user interface receives the user command .  

Regarding claim 3, Duncan in view of Shih and Khan teaches claim 1, and Shih further teaches 
the start position for reading the text is identified in the first command (voice commands for playback, i.e. reading the text, can include locations in the text, such as “Repeat Sentence”, “Repeat Paragraph”, “Next Paragraph”, “Next Chapter”, “Page N”, or “Restart”, where the playback will jump to the location in the command and start reading, i.e. start position…is identified in the first command [0035-6]).  
Where the motivation to combine is the same as previously presented. 

Regarding claim 4, Duncan in view of Shih and Khan teaches claim 1, and Duncan further teaches 
	the second command is a voice command received via the microphone from the user to input an audible comment (a user speaks a command such as “annotate”, i.e. second command is a voice command…from the user, followed by a spoken annotation, i.e. audible comment (7:50-59), and where an input device includes a microphone as part of an audio user interface, i.e. received via the microphone (3:18-26)).

Regarding claim 5, Duncan in view of Shih and Khan teaches claim 1, and Duncan further teaches
the fourth command is a voice command received via the microphone to capture the second portion of the text (a user can click or drag over text, i.e. a fourth command, to select a block of text, i.e. capture the second portion of the text (7:60-8:1), and where the user commands able to be input in graphics-related ways can also be input using the voice recognition module that has a vocabulary for recognizing the command, i.e. the fourth command is a voice command (Fig 3),(6:45-51), and where an input device includes a microphone as part of an audio user interface, i.e. received via the microphone (3:18-26)).  

	Regarding claim 6, Duncan in view of Shih and Khan teaches claim 1, and Shih further teaches
provide the second text file to the user (the extracted data may be reassembled into a document or report [0092], where postprocessed files, i.e. second text file, may be saved for distribution to human end users, i.e. provide...to the user [0050]).  
Where the motivation to combine is the same as previously presented.

Claim(s) 7-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Duncan, in view of Kurzweil et al. (U.S. PG Pub No. 2011/0288861), hereinafter Kurzweil, and further in view of Kahn.

Regarding claim 7, Duncan teaches
A method (a method for presenting content (2:33-35)), comprising:
converting text to an audio version and a speech marks file that includes a plurality of speech marks that map between ... positions of portions in the audio version and text locations of corresponding portions in the text (a graphics output thread and audio output thread are run simultaneously so that each thread is at the same location in the content as the other thread, i.e. plurality of speech marks that map between ... positions of portions in the audio version and text locations of corresponding portions in the text (2:41-49), where the audio speech thread is created from the text content of the current page as seen on the screen, i.e. converting text to an audio version (11:16-35));
providing the audio version to a user device (the audio output thread can be displayed to the user, i.e. providing the audio version, by an audio speaker included in the device, i.e. user device (1:56-59),(2:41-49) for outputting content);
receiving, from the user device, a plurality of input commands for a plurality of highlight or vocabulary events, each event of the plurality of highlight or vocabulary events includes a corresponding event time position associated with the audio version (a user highlights text and clicks on an annotation button of a device, i.e. receiving, from the user device, where different types of annotations can be created, i.e. a plurality of input commands for a plurality of …vocabulary events, to put the system into text-note mode for entering an annotation (7:55-8:8), where an annotation is associated with a particular location in the text as represented by a highlight starting and ending offset, i.e. each event of the plurality of...vocabulary events includes a corresponding event time position (9:35-67), and when a chapter file is read, the corresponding annotation file is read, where the graphics output thread and audio ;
in response to receiving an input command:
identifying a corresponding text location in the text for a corresponding highlight or vocabulary event based on the corresponding event time positions and the plurality of speech mark mappings ... (when a user highlights text and clicks on an annotation button of a device, i.e. in response to receiving an input command...for a corresponding...vocabulary event, to put the system into text-note mode for entering an annotation (7:55-8:8), where an annotation is associated with a particular location in the text as represented by a highlight starting and ending offset, i.e. identifying a corresponding text location in the text based on the corresponding event time positions (9:35-67), and when a chapter file is read, the corresponding annotation file is read, where the graphics output thread and audio output thread are run simultaneously so that each thread is at the same location in the content as the other thread, i.e. based on the...plurality of speech mark mappings... (2:41-49),(9:65-67).
While Duncan provides the synchronization of audio and text files, Duncan does not specifically teach the synchronization is based on time or a separate file containing timing information for the audio, and thus does not teach
speech marks file that includes a plurality of speech marks that map between time positions of portions in the audio version and text locations of corresponding portions in the text.
Kurzweil, however, teaches speech marks file that includes a plurality of speech marks that map between time positions of portions in the audio version and text locations of corresponding portions in the text (the system compares the words in the speech recognition output, such as from a recording of an audio book, i.e. portions in the audio version, to the words in the original text, i.e. text locations of corresponding portions in the text, and if the expected word from the original text matches the recognized word, the word is output with the time of recognition, i.e. speech marks that map the time positions, to a timing file, i.e. speech marks file, where the process is repeated for each word in the original text, i.e. a plurality [0073]).
Duncan and Kurzweil are analogous art because they are from a similar field of endeavor in providing narration of text. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the synchronization of audio and text files teachings of Duncan with the specific use of time and a timing file as taught by Kurzweil. The motivation to do so would have been to achieve a predictable result of enabling synchronization of a text display with the narration, such as an option to highlight a phrase or sentence as it is spoken (Kurzweil [0073]).
While Duncan in view of Kurzweil provides the creation of annotations and comments related to a text file, Duncan in view of Kurzweil does not specifically teach the creation of a separate text file that includes text from the original text file, and thus does not teach
extracting a text portion from the text based on the corresponding text location for the corresponding highlight or vocabulary event;
generating a separate document with the extracted text portions independent of the text; and
providing the separate document to the user device.
Kahn, however, teaches extracting a text portion from the text based on the corresponding text location for the corresponding highlight or vocabulary event (during review of a transcribed session file, i.e. text, an operator may select text, i.e. a text portion... based on the corresponding text location for the corresponding highlight or vocabulary event, which may be further extracted during post processing, i.e. extract [0074]);
generating a separate document with the extracted text portions independent of the text (post processing may include data extraction from the document itself, i.e. text, where the extracted data, i.e. extracted text portions independent of the text, may be reassembled into a document or report, i.e. generating a separate document [0092]); and
 providing the separate document to the user device (the extracted data may be reassembled into a document or report [0092], where postprocessed files, i.e. separate document, may be saved for distribution to human end users, i.e. provide...to the user [0050], and where the system functions on a computer to provide output, i.e. user device [0035-6]).
Duncan, Kurzweil, and Kahn are analogous art because they are from a similar field of endeavor in enabling a user to create transcriptions, annotations, and text-to-speech audio. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the creation of annotations and comments related to a text file teachings of Duncan, as modified by Kurzweil, with the extraction of text from the session document into another report as 

Regarding claim 8, Duncan in view of Kurzweil and Kahn teaches claim 7, and Duncan further teaches
receiving the text or a selection of the text from the user device (the annotation is associated with the text clicked on or dragged over, i.e. selection of the text, by the user through the GUI, i.e. receiving…from the user device (7:55-8:6)).  

Regarding claim 9, Duncan in view of Kurzweil and Kahn teaches claim 7, and Duncan further teaches
receiving at least one voice command from a user of the user device to obtain a portion of the text for the document (user input commands from the audio interface can be received, i.e. receiving at least one voice command from a user of the user device, and will update the graphics user interface in response (2:12-21), where a grammar for commands can include phrases such as “find (a word/a passage/ a phrase)”, i.e. obtain a portion of the text for the document (10:35-60)).  

Regarding claim 10, Duncan in view of Kurzweil and Kahn teaches claim 7, and Duncan further teaches 
identifying each time position in the plurality of speech marks that matches the corresponding event time positions (a graphics output thread and ;
Where Kurzweil teaches that the location in an audio representation is specifically a time value associated with a specific word in the original text [0073].
determining corresponding text locations in the text that map to the identified time (commands, such as the spoken command “annotate”, are processed by the abstract interface to update the other interfaces, associating the annotation with the portion of the content being displayed, where displaying refers to visual, i.e. determining corresponding text locations in the text, and audio output (5:55-67),(7:50-59), and where the abstract interface stores information such as the current position in the content being rendered, where rendering may be in a display or through the speaker, i.e. map to each identified time position, (5:55-67),(6:26-29)); and
identifying a number of sentences or a word in the text to be extracted from the text based on the determined corresponding text locations (commands can include phrases such as “find (a word/a passage/ a phrase)”, i.e. identifying a number of sentences or a word in the text (10:35-60), and the spoken command .  
Where Kahn teaches the extraction of text during post processing [0074].
The motivation to combine is the same as previously presented.

Claim(s) 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quidilig et al. (U.S. PG Pub No. 2012/0004910), hereinafter Quidilig, in view of Lee et al. (U.S. PG Pub No. 2013/0311186), hereinafter Lee, and further in view of Kahn.

Regarding claim 11, Quidilig teaches
A system, comprising (a system [0010]):
a user device that includes (the user connects to the server of the system using one of a number of devices, such as a telephone, a cellphone, or a computer, i.e. user device [0027]):
record an audio file…(the user speaks into a device such as the cellular telephone, and the sound is converted into a stream of digitized electrical signals, i.e. record an audio file [0037:8-15])
receive a plurality of input commands from a user identifying a plurality of highlight or vocabulary events associated with the audio file (as the user listens to the echo audio stream, i.e. associated with the audio file, corrections can be made, ; and
 for each input command of the plurality of input commands, determining a corresponding event time position in the audio file (the user command, i.e. for each input command of the plurality of input commands, to make a correction to the echo audio stream is associated with a particular audio segment from the echo audio stream, i.e. determining a corresponding event time position in the audio file [0069]); and
a server device that includes (the network connects the server, i.e. server device, to a plurality of users [0027]):
a second memory that stores second instructions (the server includes program code storage, i.e. second memory, that includes instructions, i.e. second instructions [0087]); and
a second processor that executes the second instructions to (the server includes a processor, i.e. second processor, that executes instructions, i.e. second instructions [0087]):
receive the audio file from the user device (the input audio stream input to the user’s cellular telephone, i.e. audio file from the user device, is sent over the network to the server, i.e. receive [0037:8-15]);
receive, from the user device, the corresponding event time position for each of the plurality of highlight or vocabulary events associated with the plurality of input commands (editing commands, i.e. each of the plurality of highlight .
While Quidilig provides the input of audio and edit commands at a user device and sending the information to a server for further processing, Quidilig does not specifically teach splitting the audio file into separate audio files for each input command, or the features of the user device, and thus does not teach
a microphone to receive audio signals;
a first memory that stores first instructions;
a first processor that executes the first instructions to:
...via a microphone
5Application No. 16/500,373Reply to Office Action Dated August 6, 2021split the audio file into a plurality of separate audio files for the plurality of highlight or vocabulary events based on the corresponding event time position in the audio file for each of the plurality of highlight or vocabulary events;
convert the plurality of separate audio files into a plurality of separate text files.  
Lee, however, teaches a microphone to receive audio signals...via a microphone (the AV input unit of a mobile terminal may include a microphone for receiving external audio signals [0402],[0415],[0417]);
a first memory that stores first instructions (the mobile terminal includes a memory, i.e. first memory, that may store a program, i.e. first instructions [0402],[0439]);
a first processor that executes the first instructions to (a method may be implemented as codes, i.e. first instructions, readable by a processor, i.e. first processor [0519]):
5Application No. 16/500,373Reply to Office Action Dated August 6, 2021split the audio file into a plurality of separate audio files for the plurality of ...events based on the corresponding event time position in the audio file for each of the plurality of ... events (user may select an audio section, such as selecting a portion of a progress bar corresponding to the audio file, i.e. event…based on the corresponding event time position in the audio file for each…event [0298], and a separate audio file is generated from the partial audio section, i.e. split the audio file into a plurality of separate audio files [0307]);
Where Quidilig teaches that the event is a highlight or vocabulary event [0069].
convert the plurality of separate audio files into a plurality of separate text files (when storing an audio file, i.e. plurality of separate audio files, a text file containing the text generated by STT may be also stored along with the audio file, i.e. convert…into a plurality of separate text files [0184]).
Quidilig and Lee are analogous art because they are from a similar field of endeavor in enabling a user to edit dictated information. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the input of audio and edit commands at a user device and sending the information to a server for further processing teachings of Quidilig with the generation of new audio files based on selected content as taught by Lee. The motivation to do so would have been to achieve a predictable result of enabling a user 
While Quidilig in view of Lee provides the generation of a separate audio file and subsequent conversion into text, Quidilig does not specifically teach the creation of notes from the text files, and thus does not teach
extract text from each of the plurality of separate text files that were generated based on the plurality of input commands;
generate a document that combines the extracted text from the plurality of separate text files; and
provide the document to the user device.
Kahn, however, teaches extract text from each of the plurality of separate text files that were generated based on the plurality of input commands (during review of one or more transcribed session files, i.e. each of the plurality of separate text files that were generated, an operator may select text which may be further extracted during post processing, i.e. extract text...based on the plurality of input commands [0074]);
generate a document that combines the extracted text from the plurality of separate text files (post processing may include data extraction from one or more transcribed session files, i.e. plurality of separate text files, where the extracted data, i.e. extracted text, may be reassembled into a document or report, i.e. generate a document that combines [0074],[0092]); and
provide the document to the user device (the extracted data may be reassembled into a document or report [0092], where postprocessed files, i.e. .  
Quidilig, Lee, and Kahn are analogous art because they are from a similar field of endeavor in enabling a user to edit dictated information. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the generation of a separate audio file and subsequent conversion into text teachings of Quidilig, as modified by Lee, with the extraction of text from the session document into another report as taught by Kahn. The motivation to do so would have been to achieve a predictable result of enabling the saving of data into a report in a database for later distribution to other users (Kahn [0050]).

Regarding claim 12, Quidilig in view of Lee and Kahn teaches claim 11, and Quidilig further teaches
	the input received from the user identifying the plurality of highlight or vocabulary events is received as a voice command … (as the user listens to the echo audio stream, corrections can be made, such as providing an alternate to the text, or making text bold/underlined/italicized, i.e. plurality of highlight or vocabulary events, based on the user speaking a command, i.e. input received from the user…is received as a voice command [0069, including table of commands]).  
	And Lee teaches a microphone (the AV input unit of a mobile terminal may include a microphone for receiving external audio signals, i.e. received…via a microphone [0402],[0415],[0417]).
.

Claim(s) 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quidilig, in view of Lee, in view of Kahn, and further in view of Duncan.

Regarding claim 13, Quidilig in view of Lee and Kahn teaches claim 11.
While Quidilig in view of Lee and Kahn provides a user giving a command to create different corrections, Quidilig in view of Lee and Kahn does not specifically teach that the type of correction is stored as a tag, and thus does not teach
receive a tag provided by the user of the user device identifying a category associated with at least one event of the plurality of highlight or vocabulary events; and
modify the extracted text associated with the at least one event in the document to include the tag.  
Duncan, however, teaches receive a tag provided by the user of the user device identifying a category associated with at least one event of the plurality of highlight or vocabulary events (the user input to create an annotation includes clicking on different buttons, i.e. provided by the user of the user device…associated with at least one event of the plurality of highlight or vocabulary event, where each button identifies the type of annotation, i.e. identifying a category, where the type of annotation is used as a header, i.e. a tag, in the annotation file (7:60-8:13),(9:54-10:15)); and
 modify the ... text associated with the at least one event in the document  to include the tag (the type of annotation is used as a header, i.e. include the tag, in the annotation file, i.e. modify the text (7:60-8:13),(9:54-10:15)).  
Where Kahn teaches that the text may be extracted for reassembly into another document or report [0074],[0092].
Quidilig, Lee, Kahn, and Duncan are analogous art because they are from a similar field of endeavor in enabling a user to edit information. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the user giving a command to create different corrections teachings of Quidilig, as modified by Lee and Kahn, with saving type of annotation as a header in an annotation file as taught by Duncan. The motivation to do so would have been to substitute similar elements to achieve a predictable result of allowing an annotation to be read along with the associated text (Duncan (9:63-67).

Regarding claim 14, Quidilig in view of Lee and Kahn teaches claim 11, and Quidilig further teaches
generate a text version of the audio file (the user’s speech as an input audio stream, i.e. audio file, is obtained by the server and processed by a speech to text function to convert the audio into text, i.e. generate a text version [0037:8-15],[0038]);
augment the text version based on plurality of highlight or vocabulary events (corrections can be made to the user input text, i.e. augment the text version, such as providing an alternate to the text, or making text bold/underlined/italicized, i.e. .
While Quidilig in view of Lee and Kahn provides making corrections to the user input text, Quidilig in view of Lee and Kahn does not specifically teach showing the augmented text to the user, and thus does not teach
provide the augmented text version to the user device.  
Duncan, however, teaches provide the augmented text version to the user device (output modes can include graphics via a visual display of the device, i.e. user device (1:56-61), including displaying the annotation and associated content graphically, i.e. providing the augmented text version (2:30-32),(9:65-67)).  
Quidilig, Lee, Kahn, and Duncan are analogous art because they are from a similar field of endeavor in enabling a user to edit information. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the making corrections to the user input text teachings of Quidilig, as modified by Lee and Kahn, with displaying the annotations and associated content graphically as taught by Duncan. The motivation to do so would have been to substitute similar elements to achieve a predictable result of enabling a visible sign to the user that an annotation is present in a text (Duncan (8:4-33)).

Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Quidilig, in view of Lee, in view of Kahn, and further in view of Ganong, III (U.S. PG Pub No. 2014/0278354), hereinafter Ganong.


While Quidilig in view of Lee and Kahn provides the creation of a new audio file based on user-selected input, Quidilig in view of Lee and Kahn does not specifically teach that the chosen amount of the audio section includes a period of time before and after the input event, and thus does not teach
		the splitting of the audio file into the plurality of separate audio files for the plurality of highlight or vocabulary events includes generating a new audio file for each of the plurality of  highlight or vocabulary events 
Ganong, however, teaches the splitting of the audio file into the plurality of separate audio files for the plurality of highlight or vocabulary events includes generating a new audio file for each of the plurality of highlight or vocabulary events to include a first portion of time prior to a corresponding event time position and a second portion of time after the corresponding event time position (a server may execute a position determination engine (PDE) upon receiving a request from a reader to find a particular location, i.e. the plurality of highlight or vocabulary events, in an audiobook, i.e. audio file [0039], where the request identifies a source position in the audio representation, where the source position may be represented as time into the audio representation where the same position is found, i.e. corresponding event time position [0052], and the audio segment, i.e. the plurality of separate audio files, used to identify and confirm the location of the source position may be a longer segment that includes the source position at a specific position within the audio .
Quidilig, Lee, Kahn, and Ganong are analogous art because they are from a similar field of endeavor in enabling a user to find specific information in audio files. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the creation of a new audio file based on user-selected input teachings of Quidilig, as modified by Lee and Kahn, with the use of different lengths of audio segments surrounding a particular source position as taught by Ganong. The motivation to do so would have been to achieve a predictable result of enabling a method to identify a target audio position in an audio representation of a work (Ganong [0002]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/           Examiner, Art Unit 2659    

/PIERRE LOUIS DESIR/           Supervisory Patent Examiner, Art Unit 2659