DETAILED ACTION

Introduction
This office action is in response to Applicant’s submission filed on 12/28/2020. Claims
1-20 are pending in the application. As such, claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) is acknowledged.  Provisional application number 62/955,963, filed on 12/31/2019.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/10/2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings filed on 12/28/2020 have been accepted and considered by the Examiner.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 4, 6, 12, 13, 15, 16 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, and 16 recites “receiving an audio segment comprising a portion of audio captured by a microphone located within a vehicle providing a ride to a user of a ride- sharing application associated with a ride-sharing service; converting the audio segment to a text segment; accessing a prediction model associated with verbal harassment detection; providing at least the text segment to the prediction model to obtain a harassment prediction; providing the audio segment to an emotion detector to obtain a detected emotion of a speaking user that made an utterance included in the audio segment; and determining based at least in part on the harassment prediction and the detected emotion that the user is being harassed.”
The limitation of “receiving…”, “converting…”, “accessing…”, “providing…”, and “determining…” as drafted covers a human organizing of activities.  More specifically, an application of a person listening to an audio from a speaker/user, writing down what is said in increments,  mentally determining if what is said is considered to be some form of harassment, also, based on what is said, try to sense the type of emotion that is expressed through the speaker’s voice, and based on the listener’s judgement of what harassment should feel like, in combination of evaluating of emotion expressed by the speaker, making a determination/decision that the speaker/user is being harassed.  
This judicial exception is not integrated into a practical application. In particular, independent claims 1, and 16 recite additional elements of “hardware processor”, and/or “memory or nonvolatile storage”, and “microphone”.  For example, in [00123] of the as filed specification, there is description of using a conventional or general purpose computer, such as personal computers, computerized tablets, PDAs and etc.  Regarding the microphone, nothing of specificity or details is discussed regard them. As such, any convention or general microphone can be applied.  Accordingly, these additional elements does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Thus, the claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Likewise, the same concept also applies to a microphone, whether it is located in a car and/or located in a cellphone as described.  Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. Thus, the claims are not patent eligible.
With respect to claim 4, the claim relates to initiating an intervention process upon determining that the user is being harassed. This reads on a human calling the 911, calling the rideshare company or telling the driver or the person conducting the harassment to stop. No additional limitations are present.  With respect to claim 6, the claim relates to wherein the microphone is included on a wireless device executing the ride-sharing application. This reads on a placement of microphone into a cellphone. No additional limitations are present.  Regarding claim 12 and 20, the claims relate to wherein determining that the user is being harassed comprises determining that at least one of a harassment prediction probability satisfies a first harassment probability threshold or a measure of the detected emotion exceeds a second harassment probability threshold.  This reads on a human mental processing whether someone has crossed the line, and determined to done enough to warrant a harassment label or if the sounded detected coming from the speaker/user has crossed the threshold of what harassment sounded like.  No additional limitations and present.  Regarding claim 13, the claim relates to wherein determining that the user is being harassed comprises determining that an aggregation of a harassment prediction probability and a measure of the detected emotion exceeds a third harassment probability threshold.  This reads on a human mental processing on the combine judgement of what an harassment is, adding the perceived emotions coming from speaker/user, and make a determination on whether the results from the aggregated combination has crossed a threshold, or gone too far.  No other limitations are present.  With respect to claim 15, the claim relates to wherein the speaking user is one of the user being harassed or a user performing the harassment.  This reads on a human determining whether the speaker is the offender or the victim.  No additional limitations are present.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 11, 15-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hodge et al. (US Patent Application Publication No: US 20200349666 A1) hereinafter as Hodge, in view of Xu et al. (CN 109256150 A) with reference to English machine translation provided, hereinafter as Xu, and further in view of  Nagula (US Patent No: US 11409963 B1) hereinafter as Nagula.

Regarding claim 1, Hodge discloses: 1. A computer-implemented method of predicting an occurrence of harassment of a user of a ride-sharing application, the computer-implemented comprising: as implemented by an interactive computing system comprising one or more hardware processors and configured with specific computer-executable instructions (See Fig. 2 where it displays a computer system comprising hardware processor, memory and etc.),
receiving an audio segment comprising a portion of audio captured by a microphone located within a vehicle providing a ride to a user of a ride- sharing application associated with a ride-sharing service ([0029] According to another aspect of one embodiment, video and audio recorded during a ride by a vehicle-mounted client device may include driver behavior, such as, attention to driving conditions, distractions, and interactions with the customers, allowing for example, training or coaching of drivers. [0069] In addition, user input may be received through one or more microphones 212. In one embodiment, microphone 212 is a digital microphone connected to audio module 206 to receive user spoken input, such as user instructions or commands. Microphone 212 may also be used for other functions, such as user communications, audio component of video recordings, or the like.) {audio segment in particular is taught in Xu reference below};
converting the audio segment to a text segment ([0080]  The user's utterance is processed by a speech-to-text algorithm and the resulting text is stored as metadata associated with the video clip.);
associated with verbal harassment detection ([0104] According to another embodiment, client device 101 records video and audio of the driver and how he or she interacts with the passenger during a ride. Client device 101 continuously monitors the driver (as well as the passenger, as described above) for any uncomfortable actions and conversations towards the passenger, including any threats or sexual comments, swearing, smoking, or similar actions. Audio and image recognition algorithms continuously analyze the audio and video from he cabin-facing camera as further described above. If any recognizable events occur, client device can announce to the driver that the inappropriate behavior is being recorded and stored in the cloud server and cannot be erased . . . and that such behavior should stop.);

Hodge does not explicitly, but Xu discloses: audio segment	([pg. 2, 5th para] cut-off sentence module, for receiving the recording data transmitted from the recording module, the recording data is cut into sections according to relevant characteristic of phonetic;)
providing the audio segment to an emotion detector to obtain a detected emotion of a speaking user that made an utterance included in the audio segment ([pg. 2, 4th-7th para] A voice emotion identification system based on machine learning, comprising a recording module, sentence breaking module, speaker recognition module, a characteristic extracting module and emotion identification module, wherein, recording module for obtaining the recording data, using a correlation algorithm to noise pre-recording data; cut-off sentence module, for receiving the recording data transmitted from the recording module, the recording data is cut into sections according to relevant characteristic of phonetic; speaker recognition module for receiving the cut-off sentence transmitted from the module segments using a machine learning algorithm classifies the segment, and identifying the speaker according to the classification; a characteristic extracting module for receiving a segment transmitted from the cut-off sentence module, extracting the frequency spectrum characteristic of each segment and Mel frequency cepstrum coefficient, and after processing on the extracted segment features; emotional recognition module, feature extraction module for receiving the generated segment features through machine learning algorithm training the sentiment prediction model, and using integrated algorithm to integrate the prediction result of each model.);
and the detected emotion that the user is being harassed ([pg. 3, 4th para] As shown in FIG. 1, according to an embodiment of the present invention the voice emotion identification system based on machine learning, comprising a recording module, sentence breaking module, speaker recognition module, a characteristic extracting module and emotion identification module, wherein, recording module, sentence-break module and a characteristic extracting module belonging to the data pre-processing and recording module, sentence-break module and feature extraction module provides the prediction basis and improve the accuracy and stability in the prediction process, and provides can be used to predict characteristic. speaker recognition module and emotion identification module belonging to the prediction using fragment and characteristic data pre-obtained speaker and emotion of each segment for prediction; ).
Hodge and Xu are considered analogous art because they are all in the related art of speech recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge to combine the teaching of Xu, to incorporate providing audio segment to an emotion detector to obtain a detected emotion from a speaker that made the utterance. Combining the disclosures may improve the effect of emotion prediction model and extracted using the method of characteristics to represent a segment, and the extracted feature to the emotion recognition module, as suggested by Xu (pg .4 last para - pg. 5, 1st para).
Hodge in view of Xu does not explicitly, but Nagula discloses: accessing a prediction model ([col. 2, lines 6-13] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.)
providing at least the text segment to the prediction model to obtain a harassment prediction ([col. 4, lines 22-29] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.);
and determining based at least in part on the harassment prediction ([col. 4, lines 22-29] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.).
Hodge, Xu and Nagula are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, to combine the teaching of Nagula, to incorporate a prediction model.  Combine the disclosures because the disclosed technologies can be deployed in any organizations including, for example, government agencies, banks, and hospitals, where real-time concept generation from text reports can help human decision makers to understand nature of the text reports, to gain insights for improving operational efficiency, and especially, to improve user experience of people using the services provided by such organizations, as suggested by Nagula (col. 2, lines 14-21).

Regarding claim 2, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1, 
Nagula further discloses: wherein the prediction model comprises at least one of a hierarchical attention network model, a fastText model, or a convolutional neural network model ([col. 3, lines 27-40] In some implementations, either before or during the clustering process, the clustering subsystem 110 uses a tokenization engine 112 to generate tokenized representations of input text reports. The tokenized representation provides corresponding subjective information of the input text reports. The tokenization engine 112 may be configured as, for example, a doc2vec model, a bag-of-words model, a fastText model, and so on. Typically, representing text in vectorized, numeric, or any other software-friendly forms enables software to more smoothly and effectively process text inputs. It should be noted that, for convenience, the description below will only refer to text or text reports, even when corresponding tokenized representations of the text or text reports could also be referred to.).

Regarding claim 3, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1, 
Xu further discloses: wherein the emotion detector implements one or more of a hidden Markov model, a support vector machine, a deep feed-forward model, a recurrent neural network, or a convolutional neural network ([pg. 5 2nd para] Specifically, first, using a feature extraction module to extract features and related machine learning algorithm, such as a convolutional neural network (CNN) and support vector machine (SVM) machine learning and advanced study of relative algorithm, then, using the speech segment closer to the Chinese language environment and more prone to producing training method of environment application layer, training the sentiment prediction model, then, using the trained model, to predict the characteristics of the unknown fragment so as to obtain the unknown fragment represented by emotion, because each model for the same recording segment prediction result may not be completely the same, at last, using integrated algorithm predicted result of each model are integrated so as to obtain each recording segment final mood.).

Regarding claim 4, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1, 
Hodge further discloses: further comprising initiating an intervention process upon determining that the user is being harassed ([0104] Audio and image recognition algorithms continuously analyze the audio and video from he cabin-facing camera as further described above. If any recognizable events occur, client device can announce to the driver that the inappropriate behavior is being recorded and stored in the cloud server and cannot be erased . . . and that such behavior should stop.).

Regarding claim 5, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 4, 
Hodge further discloses: wherein initiating the intervention process comprises one or more of: alerting an authority; alerting an administrator of the ride-sharing application; causing an alert to be displayed on a wireless device located within the vehicle; or blocking a driver user from accepting an order on the ride-sharing application ([0104] Audio and image recognition algorithms continuously analyze the audio and video from he cabin-facing camera as further described above. If any recognizable events occur, client device can announce to the driver that the inappropriate behavior is being recorded and stored in the cloud server and cannot be erased . . . and that such behavior should stop. Client device 101 can summon a police officer if the inappropriate behavior continues, or distressed responses and telling gestures from the passenger are detected.).

Regarding claim 6, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1,
Hodge further discloses: wherein the microphone is included on a wireless device executing the ride-sharing application ([0037] According to another embodiment, a passenger can connect his or her smartphone wirelessly to a vehicle-mounted client device during a ride for live-streaming video.). 

Regarding claim 11, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1,
Nagula additionally discloses: further comprising generating a vector representation of the text segment, wherein providing the text segment to the prediction model comprises providing the vector representation of the text segment to the prediction model ([col. 3, lines 34-37] Typically, representing text in vectorized, numeric, or any other software-friendly forms enables software to more smoothly and effectively process text inputs.).

Regarding claim 15, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1,
Hodge additionally discloses: wherein the speaking user is one of the user being harassed or a user performing the harassment ([0104] According to another embodiment, client device 101 records video and audio of the driver and how he or she interacts with the passenger during a ride. Client device 101 continuously monitors the driver (as well as the passenger, as described above) for any uncomfortable actions and conversations towards the passenger, including any threats or sexual comments, swearing, smoking, or similar actions. Audio and image recognition algorithms continuously analyze the audio and video from he cabin-facing camera as further described above. If any recognizable events occur, client device can announce to the driver that the inappropriate behavior is being recorded…).

Regarding claim 16, Hodge discloses: A system configured to predict an occurrence of harassment of a user of a ride-sharing application, the system comprising: a non-volatile storage configured to store one or more prediction models useable to predict the occurrence of the harassment of the user ([0061] Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. The above and other needs are met by the disclosed methods, a non-transitory computer-readable storage medium storing executable code, and systems for streaming and playing back immersive video content.);
and a hardware processor of an interactive computing system in communication with the non-volatile storage, the hardware processor configured to execute specific computer-executable instructions to at least: ([0064] The client device 101 in this exemplary embodiment includes a location module 204, a wireless transceiver module 205, an audio I/O module 206, a video module 207, a touchscreen module 208, a sensor module 209, and an I/O module 215. In this embodiment, the different modules are implemented in hardware and software modules. In alternative embodiments, these modules can be hardware, software, or a combination of both. For example, alternative embodiments may be provided ... including a wireless modem, multimedia processor, security and optionally other signal co-processors, such as for example, one or more graphics processor unit (“GPU”) cores, ... and/or one or more vision processing units (“VPU”). In one embodiment, one or more SoC processors ...and several peripheral devices, including for example cellular, Wi-Fi, and Bluetooth transceivers, as further described below.):
receiving an audio segment comprising a portion of audio captured by a microphone located within a vehicle ([0029] According to another aspect of one embodiment, video and audio recorded during a ride by a vehicle-mounted client device may include driver behavior, such as, attention to driving conditions, distractions, and interactions with the customers, allowing for example, training or coaching of drivers. [0069] In addition, user input may be received through one or more microphones 212. In one embodiment, microphone 212 is a digital microphone connected to audio module 206 to receive user spoken input, such as user instructions or commands. Microphone 212 may also be used for other functions, such as user communications, audio component of video recordings, or the like.) {audio segment in particular is taught in Xu reference below};
converting the audio segment to a text segment ([0080]  The user's utterance is processed by a speech-to-text algorithm and the resulting text is stored as metadata associated with the video clip.);
associated with verbal harassment detection from non-volatile storage([0104] According to another embodiment, client device 101 records video and audio of the driver and how he or she interacts with the passenger during a ride. Client device 101 continuously monitors the driver (as well as the passenger, as described above) for any uncomfortable actions and conversations towards the passenger, including any threats or sexual comments, swearing, smoking, or similar actions. Audio and image recognition algorithms continuously analyze the audio and video from he cabin-facing camera as further described above. If any recognizable events occur, client device can announce to the driver that the inappropriate behavior is being recorded and stored in the cloud server and cannot be erased . . . and that such behavior should stop.);

Hodge does not explicitly, but Xu discloses: audio segment	([pg. 2, 5th para] cut-off sentence module, for receiving the recording data transmitted from the recording module, the recording data is cut into sections according to relevant characteristic of phonetic;)
provide the audio segment to an emotion detector to obtain a detected emotion of a speaking user that made an utterance included in the audio segment ([pg. 2, 4th-7th para] A voice emotion identification system based on machine learning, comprising a recording module, sentence breaking module, speaker recognition module, a characteristic extracting module and emotion identification module, wherein, recording module for obtaining the recording data, using a correlation algorithm to noise pre-recording data; cut-off sentence module, for receiving the recording data transmitted from the recording module, the recording data is cut into sections according to relevant characteristic of phonetic; speaker recognition module for receiving the cut-off sentence transmitted from the module segments using a machine learning algorithm classifies the segment, and identifying the speaker according to the classification; a characteristic extracting module for receiving a segment transmitted from the cut-off sentence module, extracting the frequency spectrum characteristic of each segment and Mel frequency cepstrum coefficient, and after processing on the extracted segment features; emotional recognition module, feature extraction module for receiving the generated segment features through machine learning algorithm training the sentiment prediction model, and using integrated algorithm to integrate the prediction result of each model.);
and the detected emotion that the user is being harassed ([pg. 3, 4th para] As shown in FIG. 1, according to an embodiment of the present invention the voice emotion identification system based on machine learning, comprising a recording module, sentence breaking module, speaker recognition module, a characteristic extracting module and emotion identification module, wherein, recording module, sentence-break module and a characteristic extracting module belonging to the data pre-processing and recording module, sentence-break module and feature extraction module provides the prediction basis and improve the accuracy and stability in the prediction process, and provides can be used to predict characteristic. speaker recognition module and emotion identification module belonging to the prediction using fragment and characteristic data pre-obtained speaker and emotion of each segment for prediction; ).
Hodge and Xu are considered analogous art because they are all in the related art of speech recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge to combine the teaching of Xu, to incorporate providing audio segment to an emotion detector to obtain a detected emotion from a speaker that made the utterance. Combining the disclosures may improve the effect of emotion prediction model and extracted using the method of characteristics to represent a segment, and the extracted feature to the emotion recognition module, as suggested by Xu (pg .4 last para - pg. 5, 1st para).
Hodge in view of Xu does not explicitly, but Nagula discloses: accessing a prediction model ([col. 2, lines 6-13] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.)
provide at least the text segment to the prediction model to obtain a harassment prediction ([col. 4, lines 22-29] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.);
and determining based at least in part on the harassment prediction ([col. 4, lines 22-29] The system 100 further includes a prediction subsystem 140. The prediction subsystem 140 includes one or more predictive models of various types, including machine learning models (e.g., deep neural networks) that are each configured to receive as input text reports (or portions of text reports), process the input in accordance with current parameter values of the model, and to generate as output 142 corresponding concepts of the text reports in the input.).
Hodge, Xu and Nagula are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, to combine the teaching of Nagula, to incorporate a prediction model.  Combine the disclosures because the disclosed technologies can be deployed in any organizations including, for example, government agencies, banks, and hospitals, where real-time concept generation from text reports can help human decision makers to understand nature of the text reports, to gain insights for improving operational efficiency, and especially, to improve user experience of people using the services provided by such organizations, as suggested by Nagula (col. 2, lines 14-21).

Regarding claim 17, they recite elements of the computer-implemented claim 5, as a system. Thus, the analysis in rejecting claim 5 is equally applicable to claim 16.

Regarding claim 19, they recite elements of the computer-implemented claim 11, as a system. Thus, the analysis in rejecting claim 11 is equally applicable to claim 19.

Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hodge, in view of Xu, further in view of  Nagula, and furthermore in view of Sutha (US Patent Application Publication No: US 20210086778 A1) hereinafter as Sutha.

Regarding claim 7, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1, 
Hodge in view of Xu, further in view of Nagula does not explicitly, but Sutha discloses: wherein the audio segment comprises a filtered audio segment that is filtered to remove audio not generated by occupants of the vehicle ([0078]  The audio processor 216 may be further configured to optimize the speech data and filter noise therein to obtain a clear and filtered audio. Based on the filtered audio, the audio processor 216 may identify one or more emergency keywords or phrases uttered by the occupants 110 and 112, the context of the speech data, distress level of the speech data, pitch and count of the emergency keywords or phrases uttered by the occupants 110 and 112, and the state of consciousness of the occupants 110 and 112. Based on the filtered audio, the audio processor 216 may further identify the state of calmness or panic of the occupants 110 and 112, a presence or an absence of another occupant in the vehicle 102, and an altercation between the occupants 110 and 112 or between any of the occupant 110 or 112 and another individual.).
Hodge, Xu, Nagula and Sutha are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, to combine the teaching of Sutha to incorporate an audio filter to remove audio not generated by occupants of the vehicle.  Combine these disclosures because it would allow for better detection of voice and emotions and identification of the speakers, as suggested by Sutha (0078).

Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Hodge, in view of Xu, further in view of  Nagula, furthermore in view of Sutha, and furthermore in view of applicant provided reference of Penilla et al (US Patent No: US 10453453 B2) hereinafter as Penilla.

Regarding claim 8, Hodge in view of Xu, further in view of Nagula, furthermore in view Sutha of discloses: The computer-implemented method of claim 7, 
Hodge in view of Xu, further in view of Nagula, furthermore in view Sutha does not explicitly, but Penilla discloses: wherein the filtered audio segment is filtered to remove one or more of navigation audio, radio audio, ambient sounds from outside the vehicle, or ambient sounds generated by the vehicle during operation ([col. 4, lines 56-60] Optionally, the captured audio sample can be processed to remove noise, such as ambient noise, voice noise of other passengers, music playing in the vehicle, tapping noises, road noise, wind noise, etc. The audio sample is then processed to produce an audio signature.).
Hodge, Xu, Nagula, Sutha, and Penilla are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, furthermore in view of Sutha, to combine the teaching of Penilla, to incorporate an audio filter to remove ambient noise generated inside and outside the vehicle.  Combine these disclosures because removing ambient noise would make the audio samples easier to analyze, as suggested by Penilla (Summary).

Claims 9-10, 12, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hodge, in view of Xu, further in view of  Nagula, and furthermore in view applicant supplied reference of Balci et al. (Balci, K., & Salah, A. A. (2015). Automatic analysis and identification of verbal aggression and abusive behaviors for online social games. Computers in Human Behavior, 53, 517–526. https:doi/org/10.1016/  j.cnb.2014.10.025) hereinafter as Balci.

Regarding claim 9, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1, 
Hodge in view of Xu, further in view of Nagula does not explicitly, but Balci discloses: further comprising generating the prediction model that determines a likelihood of harassment of a first user by a second user ([sect 4] To be able to decide whether a player falls into the offender category or not, we train a supervised binary classifier, where the two classes stand for genuine offender and not an offender. This is done in an offline manner. When a new complaint arrives during the game, the proposed system evaluates the accused player’s profile, giving a likelihood of the player to be a genuine offender. Fig. 3 gives an overview of the basic flow of the system.),
wherein generating the prediction model comprises: accessing a set of training data comprising text generated from a set of audio segments, wherein at least some of the set of training data includes verbal harassment and at least some of the set of training data does not include verbal harassment; accessing a set of harassment labels associated with the set of training data, the set of harassment labels identifying an existence of or a type of harassment associated with each training data item ([sect 4] We propose a system to automatically analyze and rank player complaints. First, a training and benchmarking set is generated, where player complaints are manually labeled as ‘abusive’ or ‘offending’ by human moderators. For this study, a single moderator is assigned for the annotation task. Multiple annotators would certainly increase the quality of annotations, but at the cost of doubling or tripling the annotation expenses. The information of players involved in these complaints is extracted from a central game database. To be able to decide whether a player falls into the offender category or not, we train a supervised binary classifier, where the two classes stand for genuine offender and not an offender. This is done in an offline manner.);		
and using a machine learning algorithm to generate the prediction model based at least in part on the set of training data and the set of harassment labels ([sect 4.1] There are several supervised methods in the literature that can be used for player classification (Bishop, 2006). In our initial tests, we have evaluated several classifier schemes such as decision trees (Quinlan, 1993), support vector machines (Vapnik, 2000) and k-means clustering (Duda, Hart, & Stork, 2012), and observed that Bayes Point Machine can deal with this problem successfully. This approach is adopted as our primary classifier, and all results will be reported with this technique.  Please also see sect 4.2 and Fig. 3.) {prediction model is already discussed in claim 1 with the Nagula reference }.
Hodge, Xu, Nagula and Balci are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, to combine the teaching of Balci, to incorporate determine likelihood of harassment, generating training data, identifying and labeling harassment data and using a machine learning algorithm to model.  Combine these disclosures may improve social interaction experience, as suggested by Balci (Introduction).

Regarding claim 10, Hodge in view of Xu, further in view of Nagula, furthermore in view of Balci discloses: The computer-implemented method of claim 9,
Balci further discloses: wherein the set of training data comprises historical audio segments obtained during prior orders generated on a ride- sharing application ([sect 2] Cyberbullying usually refers to prolonged mistreatment (Kwan & Skoric, 2013), whereas in the application we discuss here, abusive behaviors can also happen once. Reynolds, Kontostathis, and Edwards (2011) have proposed to use text-mining techniques for automatically detecting cyberbullying from Internet posts. This work resembles our approach, but relies exclusively on textual content, whereas we put the stress on historical factors to determine prior probabilities of exhibiting abusive behavior.).

Regarding claim 12, Hodge in view of Xu, further in view of Nagula, discloses: The computer-implemented method of claim 1,
Hodge in view of Xu, further in view of Nagula does not explicitly, but Balci discloses: wherein determining that the user is being harassed comprises determining that at least one of a harassment prediction probability satisfies a first harassment probability threshold or a measure of the detected emotion exceeds a second harassment probability threshold ([sect 4.6] The BPM classifier outputs the likelihood of a player to be in the offender group. Since it is a binary classifier, for each sample, a likelihood value greater than a certain confidence threshold results in the classification of an offender. This threshold parameter trades-off sensitivity and specificity. In general, one will try to set a confidence threshold that results in high precision and specificity, so that human moderators can prioritize complaints about these players. High precision means that the system catches truly abusive players, whereas high specificity means that false accusations are rejected with a high probability.).
Hodge, Xu, Nagula and Balci are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, to combine the teaching of Balci, to incorporate determine harassment prediction probability satisfies a first harassment probability threshold.  Combine these disclosures because higher precision means the system will prioritize and catches abusive players more accurately, as suggested by Balci (sect 4.6).

Regarding claim 18, they recite elements of the computer-implemented claim 9, as a system. Thus, the analysis in rejecting claim 9 is equally applicable to claim 18.

Regarding claim 20, they recite elements of the computer-implemented claim 12, as a system. Thus, the analysis in rejecting claim 12 is equally applicable to claim 20.

Claims 13 is rejected under 35 U.S.C. 103 as being unpatentable over Hodge, in view of Xu, further in view of  Nagula, and furthermore in view of Asmussen et al. (US Patent Application Publication No: US 20200159601 A1 ) hereinafter as Asmussen.

Regarding claim 13, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1,
Hodge in view of Xu, further in view of Nagula does not explicitly, but Asmussen discloses: wherein determining that the user is being harassed comprises determining that an aggregation of a harassment prediction probability and a measure of the detected emotion exceeds a third harassment probability threshold ([0039] ...in case the predicted failure probability assigned to the combination category exceeds the predetermined probability threshold assigned to the combination category; and/or the permit category having assigned a predetermined probability threshold, the response action may be further adapted for allowing the queried mount event if the predicted failure probability assigned to the permit category exceeds the predetermined probability threshold assigned to the permit category. {harassment prediction is covered in claim 12 and detected emotion is covered in claim 1.  Here this reference teaches combination of multiple categories and measure if a predetermined threshold probability is exceeded.}).
Hodge, Xu, Nagula and Asmussen are considered analogous art because they are all in the related art of event detection.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, to combine the teaching of Asmussen, to incorporate aggregation of harassment prediction model and measure of detected emotion exceeding a probability threshold.  Combine these disclosures because it would predict a failure event, as suggested by Asmussen (Summary).

Claims 14 is rejected under 35 U.S.C. 103 as being unpatentable over Hodge, in view of Xu, further in view of  Nagula, and furthermore in view of applicant supplied reference, Bakish et al. (US Patent Application Publication No: US 20180232511 A1) hereinafter as Bakish.

Regarding claim 14, Hodge in view of Xu, further in view of Nagula discloses: The computer-implemented method of claim 1,
Hodge in view of Xu, further in view of Nagula does not explicitly, but Bakish discloses: wherein converting the audio segment to the text segment comprises applying the audio segment to a hidden Markov model or a deep learning model ([0055] (d) a voice print or speech sample which is acquired and/or produced by utilizing one or more biometric algorithms or sub-modules, such as a Neural Network module or a Hidden Markov Model (HMM) unit, which may utilize both the acoustic signal and the optical signal (e.g., the self-mixed signals of the optical microphone 101) in order to extract more data and/or more user-specific characteristics from utterances of the human speaker.).
Hodge, Xu, Nagula and Bakish are considered analogous art because they are all in the related art of language understanding and recognition.  Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to modify teaching of Hodge, in view of Xu, further in view of Nagula, to combine the teaching of Bakish, to incorporate applying audio segment to a hidden Markov model.  Combine these disclosures because it may utilize both the acoustic signal and the optical signal in order to extract more data and/or more user specific characteristics from utterances of the human speaker, as suggested by Bakish (0055).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Liu et al. (L. Liu, X. Zhang, M. Qiao and W. Shi, "SafeShareRide: Edge-Based Attack Detection in Ridesharing Services," 2018 IEEE/ACM Symposium on Edge Computing (SEC), 2018, pp. 17-29, doi: 10.1109/SEC.2018.00009.) hereinafter as Liu.  Liu discloses a method and system to protect drivers and passengers during rideshare.  “In this paper, we propose an edge-based three-stage attack detection framework, namely SafeShareRide, which aims to ensure
the safety of share rides. The first stage uses speech recognition to detect keywords such as ”help” or a loud quarrel during a ride. The second stage is driving behavior detection. It collects driving data from an onboard diagnostics (OBD) adapter and smartphone sensors, and detects abnormal driving behaviors exhibited through speed, acceleration and the angular rate. The
third stage is analyzing in-vehicle video recordings to determine whether there is an emergency. At the beginning of each detection period, the first two stages are running independently to capture in-vehicle danger. When the speech recognition recognizes a cry for help or the driving behavior detection discovers dangerous driving behaviors, video capture and analysis will be
automatically activated to process the in-vehicle video of the current detection period. The detection results from the first two stages, and the extracted video will be sent to the cloud or edge server. Through this three-stage detection, SafeShareRide can provide highly accurate
detection with very low bandwidth demand from video uploading.” (Liu, Introduction).  Also see Sections III-V.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to Phillip H Lam whose telephone number is (571)272-1721. The examiner can normally be reached 10 AM-6 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PHILIP H LAM/            Examiner, Art Unit 2656                                                                                                                                                                                            
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656