DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
	This office action is responsive to applicant’s remarks received on July 01, 2021. Claims 1-12 and newly added claims 13-20 are pending. 


Response to Arguments
 	Applicant’s arguments with respect to the amended claims filed on July 01, 2021 have been considered but are moot.  However, upon further consideration, a new ground(s) of rejection is made in view of Avore et al. (US 20160306788 A1 hereinafter, Avore ‘788) in combination with Cook et al. (US 20140163981 hereinafter, Cook ‘981) and further in view of Diamant et al. (US 20190341050 A1 hereinafter, Diamant ‘050).


Drawings
(The previous drawings objections are withdrawn in light of the applicant’s amendments.)

Claim Objections - 37 CFR 1.75(a)
(The previous claim objections are withdrawn in light of the applicant’s amendments.)


Claim Interpretations - 35 USC § 112(f)
(The previous claim interpretations are taken in consideration of applicant’s amendments.)


Claim Objections
Claim 8 is objected to because of the following informalities:  Claim 8 states in part “…and receive the texts from the computing device. the computing device, wherein the computing device comprises…” 
A period (.) is inserted in to the middle of the claim and wherein it should be a semicolon (;). Also, there appears to be a typo wherein it states “the computing device” an extra time. Please remove the 2nd “the computing device”. Examiner suggest changing the claim to read in part “…and receive the texts from the computing device; wherein the computing device comprises…”.
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes that Claim 8 has been changed to reflect Examiner’s suggestions.
The Examiner has tried to interpret the claims, as best the Examiner can ascertain, to develop an appropriate prior art rejection in the interests of compact prosecution. If any interpretation of the Examiner's is considered incorrect or off-base, the Examiner invites the Applicant to show the portions of the Applicant's specification which give a more proper interpretation of the claimed subject matter.



Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claim 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Avore et al. (US 20160306788 A1 hereinafter, Avore ‘788) in combination with Cook et al. (US 20140163981 hereinafter, Cook ‘981).
Burton et al. (US 20160027442 hereinafter, Burton ‘442).
Regarding claim 1; Avore ‘788 discloses sound box (Fig. 1, Conference System 100) 
comprising: 
one or more processors  (i.e. The conference system will also employ a variety of communication circuitry, as well as processing circuitry that includes one or more processors (e.g., a CPU). Paragraph 0025),
and memory (i.e. The conference system will also employ a variety of communication circuitry, as well as processing circuitry that includes one or more storage devices (e.g., a memory). Paragraph 0025) 
comprising instructions (i.e. Any of the memory devices 1214 is or includes a random access memory (RAM) (such as a Dynamic RAM (DRAM) or Static RAM (SRAM)), a flash memory (based on, e.g., NAND or NOR technology), a hard 
when executed by the one or more processors, cause the sound box to: receive audio data from a computing device (i.e. Fig. 1 shows a non-limiting example conference system 100 for facilitating a conversation between multiple participants 101-1 to 101-n and recording audio data of a conversation between the multiple participants 101-1 to 101-n. The participants 101-1 to 101-n will access the system 100 using an electronic interface that could include, but is not limited to, a telephone system and/or a computing device. Paragraph 0024-0025)
process the audio data (i.e. Fig. 2 shows an example for processing transcript data. The software modules shown in Fig. 2 are stored in and executed by hardware components (such as processors and memories). The transcript system 250 can obtain and store information related to transcript data of one or more spoken dialogues. The transcript system 250 can generate a transcript using transcript generator 251 from the audio captured by conference system 100, shown in Fig. 1. Paragraphs 0027-0029).
send the processed audio data to the computing device for converting the processed audio data to text (Fig. 4, Action 302 i.e. At action 302, the stored audio data is transcribed into a text format using the transcript generator 251. In one example, the audio data can be converted to a raw text file automatically using different speech and audio recognition techniques. In another example, a user could listen to the audio recording and manually generate the data file using an input device. The transcript generator 251 can also convert the data into an XML file (at action 303) for processing by the server system 200. Paragraph 0042).
Avore ‘788 does not expressly disclose the limitation as expressed below.
Cook ‘981 discloses receive the texts from the computing device (i.e. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session. Paragraph 0020).
Avore ‘788 and Cook ‘981 are combinable because they are from same field of endeavor of speech systems (Cook ‘981 at “Technical Field”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Avore ‘788 by adding the 

Regarding claim 2; Avore ‘788 discloses wherein the instructions, when executed by the one or more processors, cause the sound box to: send the processed audio data to the computing device by communicating with the computing device via Bluetooth, Wifi or USB (i.e. In some embodiments, each or any of the network interface devices 1216 includes one or more circuits (such as a baseband processor and/or a wired or wireless transceiver), and implements layer one, layer two, and/or higher layers for one or more wired communications technologies (such as Ethernet (IEEE 802.3)) and/or wireless communications technologies (such as Bluetooth, WiFi. Paragraph 0089).

Regarding claim 3; Claim 3 contains substantially the same subject matter as claim 1. Therefore, claim 3 is rejected on the same grounds as claim 1.

Regarding claim 4; Avore ‘788 does not expressly disclose the limitation as expressed below.
Cook ‘981 discloses correcting or typesetting a portion of the texts (i.e. The system may also include a user review module for user review and correction of the final transcription output. Paragraph 0013)
Avore ‘788 and Cook ‘981 are combinable because they are from same field of endeavor of speech systems (Cook ‘981 at “Technical Field”). 


Regarding claim 5; Cook ‘981 discloses extracting key words or key segments from the texts to obtain key content  of the processed audio data (i.e. A keyword transcript module provides for real time processing of one or more speech signals by a human agent to generate a partial transcript of keywords. Paragraph 0009)

Regarding claim 6; Cook ‘981 discloses analyzing the processed audio data to distinguish and mark audio data with different participants (i.e. A speech selection module is for user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance. The speech transcription module may be adapted to allow reordering and realigning of keywords in portions of the partial transcription associated with low recognition confidence. Paragraphs 0007 & 0011).

Regarding claim 8; Avore ‘788 discloses a system (Fig. 1, Paragraph 0010) 
comprising:
a sound box (Fig. 1, Conference System 100); 
wherein the sound box comprises:
(i.e. The conference system will also employ a variety of communication circuitry, as well as processing circuitry that includes one or more processors (e.g., a CPU). Paragraph 0025),
and first memory (i.e. The conference system will also employ a variety of communication circuitry, as well as processing circuitry that includes one or more storage devices (e.g., a memory). Paragraph 0025) 
storing first instructions (i.e. Any of the memory devices 1214 is or includes a random access memory (RAM) (such as a Dynamic RAM (DRAM) or Static RAM (SRAM)), a flash memory (based on, e.g., NAND or NOR technology), a hard disk, a magneto-optical medium, an optical medium, cache memory, a register (e.g., that holds instructions), or other type of device that performs the volatile or non-volatile storage of data and/or instructions (e.g., software that is executed on or by processors 1212). Paragraph 0088),
when executed by the one or more first processors, cause the sound box to: receive audio data from a computing device (i.e. Fig. 1 shows a non-limiting example conference system 100 for facilitating a conversation between multiple participants 101-1 to 101-n and recording audio data of a conversation between the multiple participants 101-1 to 101-n. The participants 101-1 to 101-n will access the system 100 using an electronic interface that could include, but is not limited to, a telephone system and/or a computing device. Paragraph 0024-0025)
process the audio data (i.e. Fig. 2 shows an example for processing transcript data. The software modules shown in Fig. 2 are stored in and executed by hardware components (such as processors and memories). The transcript system 250 can obtain and store information related to transcript data of one or more spoken dialogues. The transcript system 250 can generate a transcript using transcript generator 251 from the audio captured by conference system 100, shown in Fig. 1. Paragraphs 0027-0029).
send the processed audio data to the computing device for converting the processed audio data to texts (Fig. 4, Action 302 i.e. At action 302, the stored audio data is transcribed into a text format using the transcript generator 251. In one example, the audio data can be converted to a raw text file automatically using different speech and audio recognition techniques. In another example, a user could listen to the audio recording and manually generate the data file using an input device. The transcript generator 251 can also convert the data into an XML file (at action 303) for processing by the server system 200. Paragraph 0042).
the computing device (Fig. Processors 1212), 
wherein the computing device comprises: one or more second processors (Fig. Processors 1212),
and second memory (Fig. Memory 1214), 
(i.e. Any of the memory devices 1214 is or includes a random access memory (RAM) (such as a Dynamic RAM (DRAM) or Static RAM (SRAM)), a flash memory (based on, e.g., NAND or NOR technology), a hard disk, a magneto-optical medium, an optical medium, cache memory, a register (e.g., that holds instructions), or other type of device that performs the volatile or non-volatile storage of data and/or instructions (e.g., software that is executed on or by processors 1212). Paragraph 0088),
when executed by the one or more second processors, cause the computing device to: send the audio data to the sound box  (i.e. Fig. 1 shows a non-limiting example conference system 100 for facilitating a conversation between multiple participants 101-1 to 101-n and recording audio data of a conversation between the multiple participants 101-1 to 101-n. The participants 101-1 to 101-n will access the system 100 using an electronic interface that could include, but is not limited to, a telephone system and/or a computing device. Paragraph 0024-0025)
receive the processed audio data from the sound box (i.e. Fig. 2 shows an example for processing transcript data. The software modules shown in Fig. 2 are stored in and executed by hardware components (such as processors and memories). The transcript system 250 can obtain and store information related to transcript data of one or more spoken dialogues. The transcript system 250 can generate a transcript using transcript generator 251 from the audio captured by conference system 100, shown in Fig. 1. Paragraphs 0027-0029).
send the processed audio data to a server for converting the processed audio data to texts  (Fig. 4, Action 302 i.e. At action 302, the stored audio data is transcribed into a text format using the transcript generator 251. In one example, the audio data can be converted to a raw text file automatically using different speech and audio recognition techniques. In another example, a user could listen to the audio recording and manually generate the data file using an input device. The transcript generator 251 can also convert the data into an XML file (at action 303) for processing by the server system 200. Paragraph 0042).
Avore ‘788 does not expressly disclose the limitation as expressed below.
Cook ‘981 discloses receive the texts from the computing device (i.e. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session. Paragraph 0020).
and send the texts to the sound box (i.e. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session. Paragraph 0020).
(Cook ‘981 at “Technical Field”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Avore ‘788 by adding the limitations as taught by Cook ‘981. The motivation for doing so would have been advantageous because there appears to be a large commercial market for accurate but inexpensive automatic transcription of speech sessions. As a result, the combination would provide accurate and automatic transcription of real-world audio from speech sessions with multiple different speakers such as teleconferences, meeting records, police interviews, etc. Therefore, it would have been obvious to combine Avore ‘788 with Cook ‘981 to obtain the invention as specified.

Regarding claim 9; Claim 9 contains substantially the same subject matter as Claim 4. Therefore, claim 9 is rejected on the same grounds as claim 4.  

Regarding claim 10; Claim 10 contains substantially the same subject matter as Claim 5. Therefore, claim 10 is rejected on the same grounds as claim 5.

Regarding claim 11; Claim 11 contains substantially the same subject matter as Claim 6. Therefore, claim 11 is rejected on the same grounds as claim 6.

Regarding claim 12; Claim 12 contains substantially the same subject matter as Claim 7. Therefore, claim 12 is rejected on the same grounds as claim 7.

Regarding claim 13; Avore ‘788 discloses wherein the instructions, when executed by the one or more processors, cause the sound box to: output the received audio data (i.e. Each or any of the display interfaces 1218 is or includes one or more circuits that receive data from the processors 1212, generate corresponding image data based on the received data, and/or output  the generated image data to the display device 1222, which displays the image data. Paragraph 0090)

Regarding claim 14; Avore ‘788 discloses wherein the instructions, when executed by the one or more processors, cause the sound box to: copy the processed audio data (i.e. the matching module 208 can attempt to determine if a speaker's name matches a record in database 206. If the name matches a record in database 206, the server system 200 automatically matches the parsed portion of dialogue to the speaker at action 308. In one example, the server system 200 could also require the creation of a record for a new speaker so that any future dialogue spoken by the individual is automatically matched. At action 309, the matching module 208 can store the dialogue and the matched information in database 206. Paragraphs 0046-0047)

Regarding claim 16; Avore ‘788 discloses wherein the computing device is a mobile phone or a computer (i.e. The participants 101-1 to 101-n will access the system 100 using an electronic interface that could include, but is not limited to, a telephone system and/or a computing device. Paragraph 0025)

Regarding claim 17; Avore ‘788 discloses wherein the computing device is configured to send the processed audio data to a server for converting the processed audio data to the texts (Fig. 4, Action 302 i.e. At action 302, the stored audio data is transcribed into a text format using the transcript generator 251. In one example, the audio data can be converted to a raw text file automatically using different speech and audio recognition techniques. In another example, a user could listen to the audio recording and manually generate the data file using an input device. The transcript generator 251 can also convert the data into an XML file (at action 303) for processing by the server system 200. Paragraph 0042).

Regarding claim 18; Avore ‘788 discloses wherein the instructions, when executed by the one or more processors, cause the sound box to: store the processed audio data (Fig. 4, Action 302 i.e. At action 302, the stored audio data is transcribed into a text format using the transcript generator 251. Paragraph 0042).

Regarding claim 19; Claim 19 contains substantially the same subject matter as Claim 13. Therefore, claim 19 is rejected on the same grounds as claim 13.

Regarding claim 20; Claim 20 contains substantially the same subject matter as Claim 14. Therefore, claim 20 is rejected on the same grounds as claim 14.

4.	Claims 7 & 15 are rejected under 35 U.S.C. 103 as being unpatentable over Avore ‘788 in combination with Cook ‘981 and further in view of Diamant ‘050.
Regarding claim 7; Avore ‘788 does not expressly disclose the limitation as expressed below.
Diamant ‘050 discloses marking, based on the analyzing the processed audio data, the texts with different participants (i.e. A face recognition machine is operated to recognize a face of a first conference participant in the digital video, and a speech recognition machine is operated to translate the computer-readable audio signal into a first text. An attribution machine attributes the text to the first conference participant. A second computer-readable audio signal is processed similarly, to obtain a second text attributed to a second conference participant. A transcription machine automatically creates a transcript including the first text attributed to the first conference participant and the second text attributed to the second conference participant. See Abstract).
Avore ‘788 and Diamant ‘050 are combinable because they are from same field of endeavor of speech systems (Diamant ‘050 at “Background”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Avore ‘788 by adding the 

Regarding claim 15; Diamant ‘050 discloses wherein the instructions, when executed by the one or more processors, cause the sound box to: process the audio data by performing one of the following: beamforming associated with the received audio data; noise reduction on the received audio data; or amplifying the received the audio data (i.e. Fig. 3 schematically shows beamforming of sound signals by a beamforming machine. Paragraphs 0007, 0027 & 0030)


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy P. Goddard can be reached on 517-272-7773.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MARCUS T. RILEY, ESQ.
Examiner
Art Unit 2677



/MARCUS T RILEY/            Primary Examiner, Art Unit 2677