Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments filed on 5/24/2022  are being considered by the examiner.
On page 6 the applicant argues:

    PNG
    media_image1.png
    408
    1425
    media_image1.png
    Greyscale

The examiner reads a file as a stream of data in the broadest reasonable interpretation and as such digitized speech is being considered as a file. Further, the cited portion explicitly mentions an audio file created from the audio event ([0038] For example, in some embodiments, a user may speak 191 a command or desired action (execute playlist, order replacement spokes/blades, and/or obtain traffic conditions from a traffic server) [events]  which is captured as an audio file…)
On page 7 the applicant argues:

    PNG
    media_image2.png
    300
    1446
    media_image2.png
    Greyscale

The applicant’s arguments have been considered but are not found persuasive. The ‘audio event’ is mapped to the spoken command and the corresponding file is the digitized speech data (also referred to as ‘audio file’ in the cited passage as described in the previous paragraph.)
On page 9 the applicant argues:

    PNG
    media_image3.png
    341
    1435
    media_image3.png
    Greyscale

The applicant’s arguments have been considered but are not found persuasive. The cited passage from Rao teaches ([0047] Audio files may comprise computer-readable data containing audio speech signals captured and stored during or after phone calls associated with callers. In some cases, an audio file may be stored or retrieved from a server or database. And in some cases, an audio file may be incoming data stream for an ongoing call, where the analysis server decodes the call audio to convert or otherwise process the call signal of the ongoing call in real-time or near real-time. In the exemplary process 200, the audio files may be transmitted to or received by the analysis server in order to train, test, and/or implement (e.g., execute a query) the models used by the analysis server.)  In the highlighted portion, the examiner maps the “audio files” to the limitation “audio mapping files” as each audio file is used to train and test the modes as audio mapping files, as they each test the model based on different audio data. Similarly, an audio data file is a test case that that flows through the system and is therefore mapped to “test flow”.
On page 9 the applicant traverses the rejection of claim 10 based on  amendments. New reference is used to reject the claim as mentioned in the 35 USC 103 section below.
On page 11 the applicant traverses the rejection of claim 17 based on  amendments. New reference is used to reject the claim as mentioned in the 35 USC 103 section below.
In light of the above, rejection for claims 1-20 is maintained.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) in further view of El-Mankabady (US 20170091200).
With respect to claim 1 Gharabegian teaches A voice recognition system, comprising: 
a microphone ([0041] In some embodiments, referring back to FIG. 1A, a mobile computing device 110 may communicate with a shading system with artificial intelligence or voice control capabilities. In embodiments, a user may communicate with a mobile computing or communications device 110 by speaking a verbal command into one or more microphones. In some embodiments, a mobile computing or communications device 110 may communicate a digital or analog audio file to one or more processors 127 and/or an AI or voice control API 140 in a shading device housing.) configured to receive one or more spoken dialogue commands from a user in a voice recognition session ([0041] In some embodiments, referring back to FIG. 1A, a mobile computing device 110 may communicate with a shading system with artificial intelligence or voice control capabilities. In embodiments, a user may communicate with a mobile computing or communications device 110 by speaking a verbal command into one or more microphones.); and 
a processor in communication with the microphone, wherein the processor is configured ([0041] In some embodiments, referring back to FIG. 1A, a mobile computing device 110 may communicate with a shading system with artificial intelligence or voice control capabilities. In embodiments, a user may communicate with a mobile computing or communications device 110 by speaking a verbal command into one or more microphones. In some embodiments, a mobile computing or communications device 110 may communicate a digital or analog audio file to one or more processors 127 and/or an AI or voice control API 140 in a shading device housing.) to: 
receive one or more audio files associated with one or more audio events associated with the voice recognition system ([0038] For example, in some embodiments, a user may speak 191 a command or desired action (execute playlist, order replacement spokes/blades, and/or obtain traffic conditions from a traffic server) [events]  which is captured as an audio file and received at an AI or voice control API 140 stored in one or more memory devices of a shading device 170. As discussed above, in some embodiments, an AI or voice control API 140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).); 
execute the one or more audio files in a voice recognition session in an audio event ([0038] As discussed above, in some embodiments, an AI or voice control API 140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files [execute the audio file] to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).); and 
Gharabegian fails to explicitly disclose but El-Mankabady teaches output a log report indicating a result of the audio events with the voice recognition session ([0035] Upon determining that the audio message corresponds to the event represented in the log file 280, the FACP 200, under the control of the log file generator 275, may begin the embedding process by extracting the data representing the audio message from the audio file)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian in view of El-Mankabady, to output a log report indicating a result of the audio events with the voice recognition session in order to improve the transmission of the file through the system ([0036], El-Mankabady);

Claims 2 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 in further view of KWON (US 20170069317 A1)  
With respect to claim 2 Gharabegian and El-Mankabady do not explicitly teach but Kwon teaches wherein the log report is sent to a remote server associated with the voice recognition system ([0022] The voice recognition apparatus may further include a communication interface configured to transmit the log data to the server-based voice recognition apparatus to build the DB with respect to the recognition result in the server-based voice recognition apparatus.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian and El-Mankabady in view of Kwon, to send log report to a remote server associated with the voice recognition system in order to improve recognition rate ([0064], Kwon);

Claims 3 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 in further view of Fukuda (US 20190012594 A1)  
With respect to claim 3 Gharabegian, El-Mankabady do not explicitly teach, but Fukuda teaches wherein the processor is configured to receive the audio file via a socket connection ([0020] The speech signal may be provided as an audio file, an audio stream from input device such as microphone, or an audio stream via network socket.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Fukuda for processor to be configured to receive the audio file via a socket connection in order for improvement of recognition accuracy ([0087], Fukuda);


Claims 4 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) and Fukuda (US 20190012594 A1) as applied to claim 3, in further view of Juckett (US 20200126541 A1)
With respect to claim 4 Gharabegian, El-Mankabady and Fukuda do not explicitly teach but Juckett teaches wherein the socket connection includes a transmission control protocol (TCP) socket connection ([0105] In this exemplary embodiment, audio data sequence sender 130 is contemplated to write data to a socket, which can generally be defined as a one-to-one network connection. Thereafter, the transport layer may wrap the audio data sequence in a segment and “hand” it to the network layer, which will thereafter route this audio data sequence receiver 140 at a transcription workstation. Optionally, on the other side of this communication, the network layer will deliver the audio data sequence to the transport control protocol (TCP).).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady, Fukuda in view of Juckett so that the socket connection includes a transmission control protocol (TCP) socket connection in order that data packets will not be delivered out of order ([0105], Juckett);

Claims 5 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 in further view of Torok (US 20180233137 A1)  

With respect to claim 5 Gharabegian, El-Mankabady do not explicitly teach, but Torok teaches wherein the one or more audio files are received from a remote server including an audio library with additional audio files ([0036] Such remote (or cloud-based) content sources 119 are commonly known as content streaming sources where the user 102 subscribes to a service allowing the user 102 to access a library of audio files made available to the user 102 from the content sources 119.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Torok so that one or more audio files received from a remote server include an audio library with additional audio files in order to improve the likelihood that the ASR process will output speech results that make sense grammatically. ([0101], Torok);

Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 in further view of Boudin (US 20200196006 A1)
With respect to claim 6  Gharabegian, El-Mankabady do not explicitly teach, but Boudin teaches  wherein the one or more audio files include simulations of one or more spoken dialogue commands ([0147] In one particular embodiment according to the invention, the generation of the audio file comprises the synthesis of different voices (for example, for highlighting different types of text (e.g., comments), speech synthesis of a dialogue comprising different speakers with, for each character, the appropriate feminine/masculine gender voice, age voice, etc.)).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Boudin for one or more audio files to include simulations of one or more spoken dialogue commands when the synthesis of different voices brings an interest. ([0147], Boudin);

Claims 7, 8 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 and 1, respectively in further view of Subramaniam (US 20200081939 A1)  
With respect to claim 7 Gharabegian, El-Mankabady do not explicitly teach but Subramaniam teaches wherein the audio file includes a JSON file ([0036] In one embodiment, each call recording includes metadata for each call recorded and stores the data in following ways [0037] 1. As part of the Audio file (MP3/WAV headers) [0038] 2. Additional metadata file (JSON/XML) [0039] 3. Providing an API)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Subramaniam for one or more audio files to include a JSON file in order to fetch the metadata from supplemental file ([0040], Subramaniam);

With respect to claim 8 Gharabegian, El-Mankabady do not explicitly teach but Subramaniam teaches wherein the one or more audio files include associated metadata ([0040] The extraction module 214 extracts the metadata, stored as above, from each call recording by following methodologies respectively. [0041] 1. MP3/WAV file decoders to parse the metadata in audio file. [0042] 2. JSON/XML parsers to fetch the metadata from supplemental file for a given audio file). 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Subramaniam for one or more audio files include associated metadata in order to fetch the metadata from supplemental file ([0040], Subramaniam);

Claims 9 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) and El-Mankabady (US 20170091200) as applied to claim 1 in further view of Rao (US 20200243077 A1)  
With respect to claim 9 Gharabegian, El-Mankabady do not explicitly teach but Rao teaches wherein the processor is configured to receive one or more audio mapping files that link a test flow into the voice recognition session for a test case ([0047] Audio files may comprise computer-readable data containing audio speech signals captured and stored during or after phones calls associated with callers. In some cases, an audio file may be stored or retrieved from a server or database. And in some cases, an audio file may be incoming data stream for an ongoing call, where the analysis server decodes the call audio to covert or otherwise process the call signal of the ongoing call in real-time or near real-time. In the exemplary process 200, the audio files may be transmitted to or received by the analysis server in order to train, test, and/or implement (e.g., execute a query) the models used by the analysis server.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, El-Mankabady in view of Rao to receive one or more audio mapping files that link a test flow into the voice recognition session for a test case in order to identify corresponding test audio segments likely having similar acoustic regions as each of the candidate audio segments. ([0087], Rao);

Claims 10 12, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) in further view of Torok (US 20180233137 A1),  and Thomson (US 20190013038 A1).

With respect to claim 10, Gharabegian teaches A voice recognition system, comprising: 
a processor in communication with a microphone ([0041] In some embodiments, referring back to FIG. 1A, a mobile computing device 110 may communicate with a shading system with artificial intelligence or voice control capabilities. In embodiments, a user may communicate with a mobile computing or communications device 110 by speaking a verbal command into one or more microphones. In some embodiments, a mobile computing or communications device 110 may communicate a digital or analog audio file to one or more processors 127 and/or an AI or voice control API 140 in a shading device housing.), 
wherein the processor is configured to: 
receive one or more audio files associated with one or more audio events associated with the voice recognition system ([0038] For example, in some embodiments, a user may speak 191 a command or desired action (execute playlist, order replacement spokes/blades, and/or obtain traffic conditions from a traffic server) which is captured as an audio file and received at an AI or voice control API 140 stored in one or more memory devices of a shading device 170. As discussed above, in some embodiments, an AI or voice control API 140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).); 
execute the one or more audio files in a voice recognition session in an audio event in a conversational assistant system of the voice recognition system ([0038] As discussed above, in some embodiments, an AI or voice control API [conversational assistant]140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files [execute the audio file] to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).), 
[[wherein the one or more audio files are retrieved via a socket connection from a test engine]]; and 
[[output a log report indicating a result of the audio events with the voice recognition session , wherein the log report includes at least one voice recognition evaluation statistic.]].
Gharabegian fails to explicitly disclose but Torok teaches wherein the one or more audio files are retrieved via a socket connection from a test engine ([0036] Such remote (or cloud-based) content sources 119 are commonly known as content streaming sources where the user 102 subscribes to a service allowing the user 102 to access a library of audio files made available to the user 102 from the content sources 119 and [0089] For example, via the antenna(s), the input/output device interfaces 208 may connect to network devices of one or more networks 116 via a wireless local area network (WLAN) (such as WiFi) radio, Bluetooth, and/or wireless network radio, such as a radio capable of communication with a wireless communication network such as a Long Term Evolution (LTE) network, WiMAX network, 3G network, etc. A wired connection such as Ethernet may also be supported. Through the network(s) 116, the speech processing system may be distributed across a networked environment.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian in view of Torok so that one or more audio files received from a remote server include an audio library with additional audio files in order to improve the likelihood that the ASR process will output speech results that make sense grammatically. ([0101], Torok);
Gharabegian and Torok fail to explicitly disclose Thomson teaches output a log report indicating a result of the audio events with the voice recognition session, wherein the log report includes at least one voice recognition evaluation statistic ([0027] In some embodiments, the masking system 100 includes a reporting module 125. The reporting module 125 may generate reports about high-level metrics associated with the masking process. Such reports may be based on the data stored in the reporting log 140. Examples of reports for overall system performance include estimates of speech recognition accuracy [voice recognition evaluation statistic], values and statistics related to traffic, a percentage of calls with SPI, and a percentage of types of redaction (e.g., whether the agent or the real-time redactor 110 initiated the redaction) ) .
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian and Torok in view of Thomson, to output a log report indicating a result of the audio events with the voice recognition session, wherein the log report includes at least one voice recognition evaluation statistic  for employing multiple processor designs for increased computing capability ([0101], Thomson);

With respect to claim 12, Torok further teaches wherein the test engine is located on a remote server ([0036] Such remote (or cloud-based) content sources 119 are commonly known as content streaming sources where the user 102 subscribes to a service allowing the user 102 to access a library of audio files made available to the user 102 from the content sources 119)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian in view of Torok for test engine to be located on a remote server in order to improve the likelihood that the ASR process will output speech results that make sense grammatically. ([0101], Torok);

With respect to claim 14 Torok further teaches wherein the one or more audio files are received from a remote server including an audio library with additional audio files ([0036] Such remote (or cloud-based) content sources 119 are commonly known as content streaming sources where the user 102 subscribes to a service allowing the user 102 to access a library of audio files made available to the user 102 from the content sources 119.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian in view of Torok so one or more audio files are received from a remote server including an audio library with additional audio files in order to improve the likelihood that the ASR process will output speech results that make sense grammatically. ([0101], Torok);

Claims 11  are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) in further view of Torok (US 20180233137 A1), Thomson (US 20190013038 A1), and El-Mankabady (US 20170091200).

With respect to claim Gharabegian, Torok, and Thomson do not explicitly disclose, but  El-Mankabady  teaches wherein the processor is configured to output the log report after receiving the result via the socket connection (([0035] Upon determining that the audio message [result] corresponds to the event represented in the log file 280, the FACP 200, under the control of the log file generator 275, may begin the embedding process by extracting the data representing the audio message from the audio file,  and [0026] In yet a further exemplary embodiment, the data controller 510 may include a proxy or gateway application (GA) 255 stored in the memory 514 and operable to establish a connection between the data controller 510 and other devices within the system 100, e.g., the FACP 200, mobile device 400, or other server of the DHP 500 via the network interface device 516. The GA 255, in general, may be, e.g., a proxy or SSH tunnel with socket connectivity configured to establish a connection).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Torok, Thomson in view of El-Mankabady, to execute the one or more audio files in a voice recognition session in an audio event in order to improve the transmission of the file through the system ([0036], El-Mankabady);

Claims 13 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) Torok (US 20180233137 A1)  and Thomson (US 20190013038 A1), as applied to claim 10 in further view of Juckett (US 20200126541 A1)
With respect to claim 13 Gharabegian, Torok, Thomson do not explicitly teach but Juckett teaches wherein the socket connection includes a transmission control protocol (TCP) socket connection ([0105] In this exemplary embodiment, audio data sequence sender 130 is contemplated to write data to a socket, which can generally be defined as a one-to-one network connection. Thereafter, the transport layer may wrap the audio data sequence in a segment and “hand” it to the network layer, which will thereafter route this audio data sequence receiver 140 at a transcription workstation. Optionally, on the other side of this communication, the network layer will deliver the audio data sequence to the transport control protocol (TCP), which can make it “available” to audio data sequence receiver 140 as an exact copy of the data sent, i.e., TCP will not deliver packets out of order, and will wait for a retransmission in case it notices a gap in the byte stream.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Torok, Thomson in view of Juckett so that the socket connection includes a transmission control protocol (TCP) socket connection in order that data packets will not be delivered out of order ([0105], Juckett);

Claims 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) Torok (US 20180233137 A1), and Thomson (US 20190013038 A1),  as applied to claim 10 in further view of Boudin (US 20200196006 A1)
With respect to claim 15 Gharabegian, Torok, Thomson  do not explicitly teach but Boudin teaches wherein the one or more audio files include simulations of one or more spoken dialogue commands ([0147] In one particular embodiment according to the invention, the generation of the audio file comprises the synthesis of different voices (for example, for highlighting different types of text (e.g., comments), speech synthesis of a dialogue comprising different speakers with, for each character, the appropriate feminine/masculine gender voice, age voice, etc.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Torok, Thomson in view of Boudin for one or more audio files to include simulations of one or more spoken dialogue commands when the synthesis of different voices brings an interest. ([0147], Boudin);


Claims 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) Torok (US 20180233137 A1) and Thomson (US 20190013038 A1), as applied to claim 10 in further view of Ueda (US 20150078730 A1)
With respect to claim 13 Gharabegian, Torok, Thomson do not explicitly teach but Ueda teaches wherein the audio file includes a CSV file ([0110] In the first embodiment, the text file in the CSV format and the text file using the tag have been described as examples of the management information file. As long as metadata associated with video data or audio data can be recorded, any file formats can be employed. Thus, the present disclosure is not limited to the management information file in the CSV format and the text file using the tag. For example, metadata may be recorded using an extensible markup language (XML) format which is one of markup languages.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Torok, Thomson in view of Ueda for audio file to include a CSV file in order to improve efficiency of editing ([0003], Ueda);

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) in further view of Williams (US 20180329676 A1)
With respect to claim 17, Gharabegian teaches A voice recognition system, comprising: a computer readable medium storing instruction that, when executed by a processor, cause the processor to ([0033] In some embodiments, a simple AI or voice control module may include computer-readable instructions 135 stored in one or more memory devices 134. In some embodiments, the computer-readable instructions 135 may be executable by one or more processors 136 to transfer or communicate the generated audio files to an external computing device 137 for voice recognition and then conversion into one or more shading device commands. In some embodiments, the one or more shading device commands may then be received back from the external computing device 137):
receive one or more audio files associated with one or more audio events associated with the voice recognition system ([0038] For example, in some embodiments, a user may speak 191 a command or desired action (execute playlist, order replacement spokes/blades, and/or obtain traffic conditions from a traffic server) which is captured as an audio file and received at an AI or voice control API 140 stored in one or more memory devices of a shading device 170. As discussed above, in some embodiments, an AI or voice control API 140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).); 
execute the one or more audio files in a voice recognition session in an audio event in a conversational assistant system of the voice recognition system ([0038] As discussed above, in some embodiments, an AI or voice control API [conversational assistant]140 may communicate and/or transfer 192 an audio file utilizing one or more of the shading device's transceiver to an external AI or voice control server 175. In embodiments, an external AI or voice control server 175 may receive one or more audio files and a voice recognition engine or module 185 may recognize and convert 193 received audio files [execute the audio file] to a query request (e.g., a traffic condition request, an e-commerce order, a request to retrieve and stream a digital music playlist).), 
[[wherein the one or more audio files are retrieved via a socket connection based on a test case.]]; and 
Gharabegian fails to explicitly disclose but Williams teaches wherein the one or more audio files are retrieved via a socket connection based on a test case  ([0011] For example the security system may be in network communication with a remote server requiring authentication, which can be provided by the security system to the remote server in response to the receipt of the correct expected binary message by the security system. In one example the remote server may require authentication by the security system, e.g. by the delivery to the security system of a sonic tone embedding an expected binary message. In one use case the sonic tone may be delivered to the security system directly, and in alternate use cases the sonic tone may be recorded by a microphone at a user's device and then delivered as an audio file to the security system via a network connection. In other use cases, the remote server may require conventional authentication via a user name and password, and then require a second factor authentication by the security system in the manners described herein )
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, in view of Williams so that the one or more audio files are retrieved via a socket connection based on a test case in order to provide a proxy security service, and/or two factor authentication ([0022], Williams);

Claims 18 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1), Williams (US 20180329676 A1), and as applied to claim 17 in further view of El-Mankabady (US 20170091200)
With respect to claim 18, Gharabegian and Williams do not explicitly teach, but El-Mankabady teaches wherein the instructions further cause the processor to output a log report indicating a result of the audio events with the voice recognition session ([0035] Upon determining that the audio message corresponds to the event represented in the log file 280, the FACP 200, under the control of the log file generator 275, may begin the embedding process by extracting the data representing the audio message from the audio file).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian and Williams in view of El-Mankabady, to output a log report indicating a result of the audio events with the voice recognition session in order to improve the transmission of the file through the system ([0036], El-Mankabady);

Claims 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1) Williams (US 20180329676 A1) as applied to claim 17 in further view of Boudin (US 20200196006 A1)
With respect to claim 19 Gharabegian, Williams do not explicitly teach, but Boudin teaches  wherein the one or more audio files each contain different voice commands with different dialogue ([0147] In one particular embodiment according to the invention, the generation of the audio file comprises the synthesis of different voices (for example, for highlighting different types of text (e.g., comments), speech synthesis of a dialogue comprising different speakers [different voices] with, for each character, the appropriate feminine/masculine gender voice [different dialogue], age voice, etc.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Williams in view of Boudin for one or more audio files to include simulations of one or more spoken dialogue commands when the synthesis of different voices brings an interest. ([0147], Boudin);

Claims 20 is rejected under 35 U.S.C. 103 as being unpatentable over Gharabegian (US 20190343253 A1), Williams (US 20180329676 A1) as applied to claim 17 in further view of Rao (US 20200243077 A1)  
With respect to claim 20 Gharabegian, Williams do not explicitly teach, but Rao teaches wherein the processor is further configured to store instructions that cause the processor to receive one or more audio mapping files that link a test flow into the voice recognition session for a test case ([0047] Audio files may comprise computer-readable data containing audio speech signals captured and stored during or after phones calls associated with callers. In some cases, an audio file may be stored or retrieved from a server or database. And in some cases, an audio file may be incoming data stream for an ongoing call, where the analysis server decodes the call audio to covert or otherwise process the call signal of the ongoing call in real-time or near real-time. In the exemplary process 200, the audio files may be transmitted to or received by the analysis server in order to train, test, and/or implement (e.g., execute a query) the models used by the analysis server.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify Gharabegian, Williams in view of Rao to receive one or more audio mapping files that link a test flow into the voice recognition session for a test case in order to identify corresponding test audio segments likely having similar acoustic regions as each of the candidate audio segments. ([0087], Rao);
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Thomson (US 20200175961 A1) teaches ”[0976] In some embodiments, the environment 5600 also includes a scheduler 5614 configured to receive input test requests from an operator (a person). Requests may include how many tests to run, when tests should be complete, types of tests to be run, which transcription units 5644 should be tested, which CAs should be tested using associated transcription units, and under what conditions to run tests. The scheduler 5614 may be responsive to test requests to generate a set of test parameters, which may include when to run tests, which audio files to use for testing, a schedule for which transcription units 5644 to test, and how many tests to run simultaneously. The scheduler 5614 may query or receive input from a transcription unit scheduling system or other operations and administration systems to determine operations status such as transcription unit load, traffic load, transcription unit availability, and may alter test parameters to avoid interfering with the transcription of audio from live communication sessions when the transcription units may be part of a transcription service. Additionally or alternatively, the scheduler 5614 may run tests on demand from an operator or team lead supervisor (“TLS,” a.k.a. CA manager) based on received requests”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.   Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ATHAR N PASHA/Examiner, Art Unit 2657   
                                                                                                                                                                                             /DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657