DETAILED ACTION
In response to communication filed on 26 January 2022, this is the first Office Action of the merits. Claims 1-20 are pending. 
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55 for CN201310282495.7.

Claim Objections
Claims 6-7 and 14-15 are objected to because of the following informalities:  
Claims 6 and 14 recite “the file name” should read as --a file name-- as it appears to be a typographical error and may cause antecedent basis issue
Claims 7 and 15 recite “the second text” should read as --a second text-- as it appears to be a typographical error and may cause antecedent basis issue. 
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-5, 8-11, 13 and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over Whitaker et al. (US 2010/0250304 A1, hereinafter “Whitaker”) in view of Hilem (US 2012/0321281 A1, hereinafter “Hilem”) further in view of Jiang et al. (US 2013/0302018 A1, hereinafter “Jiang”).

Regarding claim 1, Whitaker teaches
A method, comprising: (see Whitaker, [0009] “a method for annotating”). 
initiating recording audio content; (see Whitaker, [0040] “Audio capturing module 106 is configured to record audio associated with process”).
receiving a first input; (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; [0053] “may include human interaction (e.g., a human interacting with a computer) to annotate process instances”; Fig. 23 – Process step tag 2302).
displaying,… (see Whitaker, Fig. 23 – process step tag 2302 – 00:00.0) the first input, (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; [0189] “FIG. 23 shows a time line 2300 indicating process step tags applied to a recording”; Fig. 23 – Process step tag 2302) a first tag on a first position on a time axis, wherein the first position is associated with a first time point; (see Whitaker, [0100] “indicate a time position within a recording at the time that a process step tag is applied”; Fig. 23 – process step tag 2302 – 00:00.0).
receiving a second input (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; [0053] “may include human interaction (e.g., a human interacting with a computer) to annotate process instances”; Fig. 23 – Process step tag 2304) at a second time point (see Whitaker, Fig. 23 - 02:34.5) while the first tag is displayed; (see Whitaker, Fig. 23 – process step tag 2302 is still displayed while process step tag 2304 is being displayed).
displaying,… (see Whitaker, [0100] “indicate a time position within a recording at the time that a process step tag is applied”; Fig. 23 – process step tag 2304 – 02:34.5; [0189] “FIG. 23 shows a time line 2300 indicating process step tags applied to a recording”) the second input, (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; Fig. 23 – Process step tag 2304) a second tag on a second position on the time axis, wherein the second position is associated with a second time point; (see Whitaker, [0100] “indicate a time position within a recording at the time that a process step tag is applied”; Fig. 23 – process step tag 2304 – 02:34.5).
receiving a third input (see Whitaker, [0072] “receives third communication”) while the first tag and the second tag are displayed; (see Whitaker, Fig. 23 – process step tag 2302 and process step tag 2304 are both being displayed).
identifying,… (see Whitaker, [0053] “may incorporate voice recognition techniques to recognize words/sentences associated with process steps and/or other items of interest in recorded voice of a voice recording, and to automatically generate a searchable transcript, and apply corresponding process step tags, attributes, and/or discovery information to the voice recording”) the third input, (see Whitaker, [0072] “receives third communication”) a first text associated with the first tag, wherein the first text is obtained by performing speech recognition (see Whitaker, [0053] “may incorporate voice recognition techniques to recognize words/sentences associated with process steps and/or other items of interest in recorded voice of a voice recording, and to automatically generate a searchable transcript, and apply corresponding process step tags, attributes, and/or discovery information to the voice recording”) of the audio content; (see Whitaker, [0189] “process step tags applied to a recording”). 
receiving a fourth input; and (see Whitaker, [0072] “Fourth communication signal 708 includes the selected process instance 112, and is received”). 
the fourth input, (see Whitaker, [0072] “Fourth communication signal 708 includes the selected process instance 112, and is received”).
Whitaker does not explicitly teach in response to the first input, in response to the second input, in response to the third input, generating, in response to the fourth input, an audio file. 
However, Hilem discloses visual timeline associated with the digital information and also teaches 
apply visual tags in response to user input (see Hilem, [0010] “includes at least one processor configured to generate a visual timeline associated with the digital video, reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices… using one or more visual tags”; [claim 18] “reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices, temporally indicate the reserved frames on the visual timeline using one or more visual tags”).
apply visual tags in response to user input (see Hilem, [0010] “includes at least one processor configured to generate a visual timeline associated with the digital video, reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices… using one or more visual tags”; [claim 18] “reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices, temporally indicate the reserved frames on the visual timeline using one or more visual tags”).
apply visual tags in response to user input (see Hilem, [0010] “includes at least one processor configured to generate a visual timeline associated with the digital video, reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices… using one or more visual tags”; [claim 18] “reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices, temporally indicate the reserved frames on the visual timeline using one or more visual tags”).
apply visual tags in response to user input (see Hilem, [0010] “includes at least one processor configured to generate a visual timeline associated with the digital video, reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices… using one or more visual tags”; [claim 18] “reserve one or more frames of the digital video for insertion of the user content in response to user input received from the user interface of one or more of the client devices, temporally indicate the reserved frames on the visual timeline using one or more visual tags”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include functionality of performing a specific action in response to a user input as being taught by Hilem in the system taught by Whitaker, to yield the predictable results of effectively processing digital information (see Hilem, [0018] “the display 114 may provide basic navigational controls 118 which enable a user to pause, stop or play the digital video 116… which enables a user to view the progress of the digital video 116… may be retrieved from a digital video database 102 of a shared network 108”). 
The proposed combination of Whitaker and Hilem does not explicitly teach generating, in response to the fourth input, an audio file. 
However, Jiang discloses video presentation based on point of interest and teaches
generating, an audio file (see Jiang, [0014] “generating a composite point of interest video file from the video file based on the start time and end time of the point of interest”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include functionality taught by Jiang in the system taught by the proposed combination of Whitaker and Jiang, to yield the predictable results of effectively generating point of interest data file (see Jiang, [0040] “recording device 108 sends the VOD file to media processing device 110 in response to receiving the request. In 326, media processing device 110 generates a task file from the POI data file and a VOD file name. In 328, the task file is executed by media processing device 110. In 330, media processing device 110 generates the composite POI video file which includes the video clips designated as points of interest within the POI data file from the task file”).
Claims 9 and 17 incorporate substantively all the limitations of claim 1 in a device (see Whitaker, [0053] “Process annotation system 302 and process analysis system 30… may be implemented as computer code configured to be executed in one or more processors”; Fig. 24) and computer-readable medium form (see Whitaker, [0202] “Devices in which embodiments may be implemented may include storage, such as storage drives, memory devices, and further types of computer-readable media… Such program code, when executed in one or more processors, causes a device to operate”) and are rejected under the same rationale.

Regarding claim 3, the proposed combination of Whitaker, Hilem and Jiang teaches
further comprising obtaining the first text from (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 – 00:00.0; [0109] “information (e.g., alphanumeric text) to assign to the discovery”) a network- side server (see Whitaker, [0057] “to communicate with server 506 through network 504”).
Claims 10 and 18 incorporate substantively all the limitations of claim 3 in a device and computer-readable medium form and are rejected under the same rationale.

Regarding claim 4, the proposed combination of Whitaker, Hilem and Jiang teaches
further comprising obtaining the first text (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 – 00:00.0; [0109] “information (e.g., alphanumeric text) to assign to the discovery”) from a local device (see Whitaker, [0169] “reviewed with client”).   
Claims 11 and 19 incorporate substantively all the limitations of claim 4 in a device and computer-readable medium form and are rejected under the same rationale.

Regarding claim 5, the proposed combination of Whitaker, Hilem and Jiang teaches
wherein receiving the first input comprises receiving the first input (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; Fig. 23 – Process step tag 2302) from an operation on a screen (see Whitaker, [0053] “including by interacting with a keyboard, a mouse, a touch screen”).
Claim 13 incorporates substantively all the limitations of claim 5 in a device form and is rejected under the same rationale.

Regarding claim 8, the proposed combination of Whitaker, Hilem and Jiang teaches
wherein the audio file comprises (see Jiang, [0014] “generating a composite point of interest video file from the video file based on the start time and end time of the point of interest”) a first audio file part corresponding to (see Jiang, [0042] “a first POI video clip (No. 1) is designated with a start time at the 1:00 minute mark and an end time at the 3:30 minute mark. A second POI video clip (No. 2) is designated with a start time at the 15:10 minute mark and an end time at the 31 :34 minute mark. In a particular embodiment, a composite POI video generated from this POI data will include the portions of the original video between the 1 : 00 minute mark and the 3: 30 minute mark, and the 15: 10 minute mark and the 31 :34 minute mark while omitting the rest of the original video”) the first input (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; [0053] “may include human interaction (e.g., a human interacting with a computer) to annotate process instances”; Fig. 23 – Process step tag 2302; [0100] “indicate a time position within a recording at the time that a process step tag is applied”; Fig. 23 – process step tag 2302 – 00:00.0) and a second audio file part corresponding to see Jiang, [0042] “a first POI video clip (No. 1) is designated with a start time at the 1:00 minute mark and an end time at the 3:30 minute mark. A second POI video clip (No. 2) is designated with a start time at the 15:10 minute mark and an end time at the 31 :34 minute mark. In a particular embodiment, a composite POI video generated from this POI data will include the portions of the original video between the 1 : 00 minute mark and the 3: 30 minute mark, and the 15: 10 minute mark and the 31 :34 minute mark while omitting the rest of the original video”) the second input (see Whitaker, [0082] “A user determines what particular process step is being performed during a section of the recording by viewing/listening to the section of the recording, and is enabled by process step tag interface 1106 to apply a process step tag to the section of the recording that is descriptive of the determined process step”; Fig. 23 – Process step tag 2304; [0100] “indicate a time position within a recording at the time that a process step tag is applied”; Fig. 23 – process step tag 2304 – 02:34.5). The motivation for the proposed combination is maintained. 
Claim 16 incorporates substantively all the limitations of claim 8 in a device form and is rejected under the same rationale. 

Claims 2, 12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Whitaker, Hilem and Jiang in view of Verhey-Henke et al. (US 2007/0027670 A1, hereinafter “Verhey”).

Regarding claim 2, the proposed combination of Whitaker, Hilem and Jiang teaches
the first text (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 – 00:00.0; [0109] “information (e.g., alphanumeric text) to assign to the discovery”) before (see Hilem, [0021] “Selectable thumbnails representative of previously recorded user content 122 may also be displayed to the immediate user so as to enable the user to preview user content 122 before”) receiving the first input (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 – 00:00.0; [0109] “information (e.g., alphanumeric text) to assign to the discovery”).
The proposed combination of Whitaker, Hilem and Jian does not explicitly teach further comprising presetting a target language of the first text. 
However, Verhey discloses and teaches
further comprising presetting a target language of terms (see Verhey, [0018] “parses… using predetermined language specific terms, syntax, labels and identifiers to identify user interface text strings… associated with a text string and corresponding image element”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include functionality of target language as being taught by Verhey in the system taught by the proposed combination of Whitaker, Hilem and Jiang to yield the predictable results of reducing the data processing burden and processing error and improves data processing and interfacing speed (see Verhey, [0022] “This reduces the data processing burden and processing error and improves data processing and interfacing speed by reducing, compilation, input-output interfacing and communication of redundant and previously translated text strings for translation as well as in duplicative redundant processing of re-translated replicated text strings and text string portions”). 
Claims 12 and 20 incorporates substantively all the limitations of claim 2 in a device and computer-readable medium form and are rejected under the same rationale.

Claims 6-7 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Whitaker, Hilem and Jiang in view of Blessing et al. (US 2009/0138405 A1, hereinafter “Blessing”).

Regarding claim 6, the proposed combination of Whitaker, Hilem and Jiang teaches
wherein the file name is (see Jiang, [0014] “generating a composite point of interest video file”).
The proposed combination of Whitaker, Hilem and Jiang does not explicitly teach file name is in a stipulated naming format. 
However, Blessing discloses capturing elements such as audio visual samples and also teaches
file name is in a stipulated naming format (see Blessing, [0062] “”The information, which relates to the dictated speech elements or speech segments may be stored as text, which may be used as the file name for the captured audio and video sample file).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include functionality of a specific file naming format as being taught by Blessing in the system taught by the proposed combination of Whitaker, Hilem and Jiang to yield the predictable results of effectively identifying specific information (see Blessing, [0062] “stored is information, which identifies the captured biometric elements... The wav-file may however be designated with a code xyz.wav with xyz pointing to dictation "two" or "2" contained in another database; i.e. the identifier xyz would link the captured biometric data contained in one memory range to dictated data stored in another memory range”). 
Claim 14 incorporates substantively all the limitations of claim 6 in a device form and is rejected under the same rationale.

Regarding claim 7, the proposed combination of Whitaker, Hilem and Jiang teaches
wherein the audio file… (see Jiang, [0014] “generating a composite point of interest video file”) the first text and (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 – 00:00.0; [0109] “information (e.g., alphanumeric text) to assign to the discovery”) the second text (see Whitaker, [0100] “discoveries may also be related to a specific time in the recording”; Fig. 23 - 02:34.5; [0109] “information (e.g., alphanumeric text) to assign to the discovery”).
The proposed combination of Whitaker, Hilem and Jiang does not explicitly teach the audio file has a file name, and wherein the file name comprises the first text and the second text. 
However, Blessing discloses capturing elements such as audio visual samples and also teaches
the audio file has a file name, and wherein the file name comprises speech segments as text for captured audio (see Blessing, [0062] “”The information, which relates to the dictated speech elements or speech segments may be stored as text, which may be used as the file name for the captured audio and video sample file). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include functionality of a specific file naming format as being taught by Blessing in the system taught by the proposed combination of Whitaker, Hilem and Jiang to yield the predictable results of effectively identifying specific information (see Blessing, [0062] “stored is information, which identifies the captured biometric elements... The wav-file may however be designated with a code xyz.wav with xyz pointing to dictation "two" or "2" contained in another database; i.e. the identifier xyz would link the captured biometric data contained in one memory range to dictated data stored in another memory range”). 
Claim 15 incorporates substantively all the limitations of claim 7 in a device form and is rejected under the same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAISHALI SHAH whose telephone number is (571)272-8532. The examiner can normally be reached Monday - Friday (7:30 AM to 4:00 PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached on (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/VAISHALI SHAH/Primary Examiner, Art Unit 2156