DETAILED ACTION

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102 (a) (1) as being U.S Patent No. 8,150,020 B1 to Blanchard et al. (hereinafter “Blanchard”).
Regarding claim 1, Blanchard discloses a computer-implemented method for processing a call drop likelihood prediction for an interactive call data object (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), the computer-implemented method comprising: identifying, using one or more processors, a group of interactive call feature data objects associated with the interactive call data object (Abstract, column 1, lines 7-17 and  column 7, line 27-32; detecting of user hang ups early in the call, correlating the number of hang ups to the number of hang ups for various versions of the same prompt or series of prompts, providing analysis of the hang up data such as identifying where in the call users hang up, and automatically optimizing replacement prompt selection based on hang-up data) , wherein the group of interactive call feature data objects comprises an interactive call audio data object and an interactive call metadata object (column 6, lines 27-37;  data base of stored voice prompts may include sequences of phrases); processing, using the one or more processors, the group of interactive call feature data objects using a real-time call monitoring machine learning framework to generate the call drop likelihood prediction (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), wherein processing the group of interactive call feature data objects using the real-time call monitoring machine learning framework comprises: processing the interactive call audio data object using an audio data processing machine learning model of the real-time call monitoring machine learning framework to generate an audio-based embedding data object of a plurality of inferred interactive call embedding data objects for the interactive call data object, processing the interactive call audio data object using an audio transcript processing machine learning model of the real-time call monitoring machine learning framework to generate a transcript-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object (column 2, lines 47-58 and column 6, lines 10-37; (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), and generating the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object; and performing, using the one or more processors, one or more prediction-based actions based at least in part on the call drop likelihood prediction  (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Regarding claim 2, Blanchard discloses the computer-implemented method of claim 1, wherein: the group of interactive call feature data objects further comprises an interactive call event sequence descriptor data object, and processing the group of interactive call feature data objects using the real-time call monitoring machine learning framework further comprises processing the interactive call event sequence descriptor data object using an event sequence processing machine learning model of the real-time call monitoring machine learning framework to generate an event-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object (column 6, lines 27-37;  data base of stored voice prompts may include sequences of phrases).

Regarding claim 3, Blanchard discloses the computer-implemented method of claim 2, wherein the event sequence processing machine learning model comprises one or more sequential processing layers (column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 4, Blanchard discloses the computer-implemented method of claim 3, wherein each sequential processing layer of the one or more sequential processing layers is selected from a group consisting of a recurrent neural network layer and a gated recurrent unit layer (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 5, Blanchard discloses the computer-implemented method of claim 2, wherein: the interactive call event sequence descriptor data object describes an ordered sequence of one or more interactive call events, each interactive call event of the one or more interactive call events is selected from a group of candidate interactive call events, and the group of candidate interactive call events are associated with an interactive voice response system associated with the interactive call data object (column 2, lines 42-47 and column 6, lines 10-20;  The two measuring steps may be performed simultaneously. The caller event rates may be measured by alternatively playing the test phrase and the alternative phrase, or by playing the phrases in random order, or by using alternative phrases in voice prompts in a predetermined pattern. The contingency may further include a comparison of a calculated caller event rate to a predetermined threshold value caller event rate. The contingency in the method may be based on a selecting an alternative phrase with the lowest caller event rate).

Regarding claim 6, Blanchard discloses the computer-implemented method of claim 1, wherein the audio data processing machine learning model comprises:
an audio transformation layer that is configured to process the interactive call audio data object to generate a transformed audio data object, one or more sequential processing layers that are collectively configured to generate the audio-based embedding data object based at least in part on the transformed audio data object (column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 7, Blanchard discloses the computer-implemented method of claim 6, wherein:
the audio data processing machine learning model further comprises a convolutional layer that is configured to process the transformed audio data object to generate a convolutional output data object; and generating the audio-based embedding data object based at least in part on the transformed audio data object comprises processing the convolutional output data object using the one or more sequential processing layers to generate the audio-based embedding data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 8, Blanchard discloses the computer-implemented method of claim 1, wherein the audio transcript processing machine learning model comprises:
a transcription layer that is configured to process the interactive call audio data object to generate an audio transcript data object, and one or more sequential processing layers that are collectively configured to generate the transcript-based embedding data object based at least in part on the audio transcript data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 9, Blanchard discloses the computer-implemented method of claim 1, wherein the interactive call metadata object comprises one or more caller identifier descriptor data objects that describe one or more caller identifier features associated with a caller identifier profile for the interactive call data object (column 7, lines 27-51; caller identification and routing, balance inquiry, and airline ticket booking. IVR systems are generally used at the front end of call centers to identify which service the caller wants, to retrieve numeric information such as the caller's account numbers, and to provide answers to simple questions such as account balances or pre-recorded information).

Regarding claim 10, Blanchard discloses the computer-implemented method of claim 1, wherein generating the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object comprises
processing the plurality of inferred interactive call embedding data objects and the interactive call metadata object using a feature merger machine learning model to generate a merged feature data object, processing the merged feature data object using a dense processing machine learning model to generate a dense model output data object, and generating the call drop likelihood prediction based at least in part on the dense model output data object (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Regarding claim 11, Blanchard discloses an apparatus for processing a call drop likelihood prediction for an interactive call data object, the apparatus comprising at least one processor and at least one memory including program code, the at least one memory and the program code configured to (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), with the at least one processor, cause the apparatus to at least:
identify a group of interactive call feature data objects associated with the interactive call data object, wherein the group of interactive call feature data objects comprises an interactive call audio data object and an interactive call metadata object (Abstract, column 1, lines 7-17 and  column 7, line 27-32; detecting of user hang ups early in the call, correlating the number of hang ups to the number of hang ups for various versions of the same prompt or series of prompts, providing analysis of the hang up data such as identifying where in the call users hang up, and automatically optimizing replacement prompt selection based on hang-up data); process the group of interactive call feature data objects using a real-time call monitoring machine learning framework to generate the call drop likelihood prediction (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), wherein processing the group of interactive call feature data objects using the real-time call monitoring machine learning framework comprises: process the interactive call audio data object using an audio data processing machine learning model of the real-time call monitoring machine learning framework to generate an audio-based embedding data object of a plurality of inferred interactive call embedding data objects for the interactive call data object (column 2, lines 47-58 and column 6, lines 10-37; (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), process the interactive call audio data object using an audio transcript processing machine learning model of the real-time call monitoring machine learning framework to generate a transcript-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object, and generate the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object; and perform one or more prediction-based actions based at least in part on the call drop likelihood prediction (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Regarding claim 12, Blanchard discloses the apparatus of claim 11, wherein: the group of interactive call feature data objects further comprises an interactive call event sequence descriptor data object, and the at least one memory and the program code are configured to, with the at least one processor, cause the apparatus to process the group of interactive call feature data objects using the real-time call monitoring machine learning framework by processing the interactive call event sequence descriptor data object using an event sequence processing machine learning model of the real-time call monitoring machine learning framework to generate an event-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 13, Blanchard discloses the apparatus of claim 11, wherein the audio data processing machine learning model comprises:
an audio transformation layer that is configured to process the interactive call audio data object to generate a transformed audio data object, a convolutional layer that is configured to process the transformed audio data object to generate a convolutional output data object, and one or more sequential processing layers that are collectively configured to generate the audio-based embedding data object based at least in part on the convolutional output data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 14, Blanchard discloses the apparatus of claim 11, wherein the audio transcript processing machine learning model comprises:
a transcription layer that is configured to process the interactive call audio data object to generate an audio transcript data object, and one or more sequential processing layers that are collectively configured to generate the transcript-based embedding data object based at least in part on the audio transcript data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 15, Blanchard discloses the apparatus of claim 11, wherein the at least one memory and the program code are configured to, with the at least one processor, cause the apparatus to generate the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object by: processing the plurality of inferred interactive call embedding data objects and the interactive call metadata object using a feature merger machine learning model to generate a merged feature data object, processing the merged feature data object using a dense processing machine learning model to generate a dense model output data object, and generating the call drop likelihood prediction based at least in part on the dense model output data object (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Regarding claim 16, Blanchard discloses a non-transitory computer storage medium comprising instructions for processing a call drop likelihood prediction for an interactive call data object, the instructions being configured to cause one or more computer processors to at least perform operations configured to (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts):
identify a group of interactive call feature data objects associated with the interactive call data object, wherein the group of interactive call feature data objects comprises an interactive call audio data object and an interactive call metadata object (Abstract, column 1, lines 7-17 and  column 7, line 27-32; detecting of user hang ups early in the call, correlating the number of hang ups to the number of hang ups for various versions of the same prompt or series of prompts, providing analysis of the hang up data such as identifying where in the call users hang up, and automatically optimizing replacement prompt selection based on hang-up data); process the group of interactive call feature data objects using a real-time call monitoring machine learning framework to generate the call drop likelihood prediction (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), wherein processing the group of interactive call feature data objects using the real-time call monitoring machine learning framework comprises: process the interactive call audio data object using an audio data processing machine learning model of the real-time call monitoring machine learning framework to generate an audio-based embedding data object of a plurality of inferred interactive call embedding data objects for the interactive call data object, process the interactive call audio data object using an audio transcript processing machine learning model of the real-time call monitoring machine learning framework to generate a transcript-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object (column 2, lines 47-58 and column 6, lines 10-37; (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts), and generate the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object; and perform one or more prediction-based actions based at least in part on the call drop likelihood prediction (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Regarding claim 17, Blanchard discloses the non-transitory computer storage medium of claim 16, wherein: the group of interactive call feature data objects further comprises an interactive call event sequence descriptor data object, and the instructions are configured to cause the one or more computer processors to at least perform operations configured to process the group of interactive call feature data objects using the real-time call monitoring machine learning framework by processing the interactive call event sequence descriptor data object using an event sequence processing machine learning model of the real-time call monitoring machine learning framework to generate an event-based embedding data object of the plurality of inferred interactive call embedding data objects for the interactive call data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 18, Blanchard discloses the non-transitory computer storage medium of claim 16, wherein the audio data processing machine learning model comprises:
an audio transformation layer that is configured to process the interactive call audio data object to generate a transformed audio data object, a convolutional layer that is configured to process the transformed audio data object to generate a convolutional output data object, and one or more sequential processing layers that are collectively configured to generate the audio-based embedding data object based at least in part on the convolutional output data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 19, Blanchard discloses the non-transitory computer storage medium of claim 16, wherein the audio transcript processing machine learning model comprises:
a transcription layer that is configured to process the interactive call audio data object to generate an audio transcript data object, and one or more sequential processing layers that are collectively configured to generate the transcript-based embedding data object based at least in part on the audio transcript data object (column 4, lines 5-25 and column 6, lines 27-37; data base of stored voice prompts may include sequences of phrases).

Regarding claim 20, Blanchard discloses the non-transitory computer storage medium of claim 16, wherein the instructions are configured to cause the one or more computer processors to at least perform operations configured to generate the call drop likelihood prediction based at least in part on the plurality of inferred interactive call embedding data objects and the interactive call metadata object by: processing the plurality of inferred interactive call embedding data objects and the interactive call metadata object using a feature merger machine learning model to generate a merged feature data object, processing the merged feature data object using a dense processing machine learning model to generate a dense model output data object, and generating the call drop likelihood prediction based at least in part on the dense model output data object (column 4, line 67 to column 5, line 3 and column 8, lines 1-5; increasing the likelihood that callers choose to remain on the call and complete their requests. Callers will be less likely to opt out of, or hang up on, an automated telephone system that has been well trailed and therefore delivers well-tuned voice prompts).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AKELAW A TESHALE whose telephone number is (571)270-5302. The examiner can normally be reached 9 am -6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

AKELAW TESHALE
Primary Examiner
Art Unit 2653



/AKELAW TESHALE/Primary Examiner, Art Unit 2653