DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1-20 are allowed.

The following is an examiner’s statement of reasons for allowance:
With respect to claim 1, the prior art of record fails to disclose singly or incombination or render obvious a computer-implemented method of event detection, the method comprising: 
obtaining, at one or more processors, multimedia data including multiple frames of video data and audio data corresponding to the video data; 
processing, by the one or more processors, the multiple frames of video data to detect at least one object depicted in the video data and to track the at least one object between two or more frames of the multiple frames; 
based on the video data, generating, by the one or more processors, sonification audio data using a sonification process, wherein the sonification audio data is a digital representation of a sound that represents a position of the at least one object in the two or more frames, a movement of the at least one object in the two or more frames, or both the position and the movement of the at least one object in the two or more frames, and wherein generating the sonification audio data comprises: 
mapping, by the one or more processors, one or more audio parameter values included in a look-up table to one or more aspects of data associated with the at least one object, wherein the one or more aspects include at least one of the position and the movement of the at least one object; 
combining, by the one or more processors, the audio data and the sonification audio data; and 
providing, by the one or more processors, input based on the combined audio data and sonification audio data to a trained event classifier to detect an event represented in the multimedia data.
Obara (US 2020/0053401), Altuev (US 2018/0098034), and Kruglick (US 2015/0350716) are the closest prior art relating to the Applicant's claimed invention. 
Obara discloses methods and systems for providing a video stream along with a slow motion video showing a particular event depicted in the video stream are described herein. The method includes generating a first video stream and generating a second video stream, which is a slow motion video stream, from the first video stream by modifying a playback speed of the first video stream. The method includes monitoring content of the first video stream to identify an event trigger of a predefined set of event triggers. Each event trigger indicates a presence in the first video stream of an event that is to be generated for display using the second video stream. The method includes determining, based on the identifying of the event trigger, to transmit the second video stream along with the first video stream, and simultaneously transmitting both the first video stream and the second video stream.
Altuev discloses a method for exchanging data between an IP video camera using an embedded video analytics and an external server comprises generating at least one video frame by said IP video camera; converting the video frame to a digital form by said IP video camera; processing the converted video frame via an IP processor; video cameras, using computer vision techniques, then creating metadata, transferring the received metadata to an external server for further use. The generated metadata is stored in the camera's IP storage, and then the stored metadata is read by the server. The metadata is stored in the DBMS of the IP video camera, the search query to the DBMS is received from the external server, the search query from the external server is processed in the DBMS, and the search results are transferred from the DBMS to the external server.
Kurglick discloses technologies are generally described for a system to process a collection of video recordings of a scene to extract and localize audio sources for the audio data. According to some examples, video recordings captured by mobile devices from different perspectives may be uploaded to a central database. Video segments capturing an overlapping portion of the scene at an overlapping time may be identified, and a relative location of each of the video capturing devices may be determined. Audio data for the video segments may be indexed with a sub-frame time reference and relative locations as a function of overlapping time. Using the indices that include the sub-frame time references and relative locations, audio sources for the audio data may be extracted and localized. The extracted audio sources may be transcribed and indexed to enable searching, and may be added back to each video recording as a separate audio channel.
The prior art do not disclose or render obvious the amended features.

With respect to claim 13, the prior art of record fails to disclose singly or incombination or render obvious a system for event detection, the system comprising: 
one or more processors; and 
one or more memory devices coupled to the one or more processors, the one or more memory devices storing instructions that are executable by the one or more processors to perform operations including: 
obtaining multimedia data including multiple frames of video data and audio data corresponding to the video data; 
processing the multiple frames of video data to detect at least one object depicted in the video data and to track the at least one object between two or more frames of the multiple frames; 
based on the video data, generating sonification audio data using a sonification process, wherein the sonification audio data is a digital representation of a sound that represents a position of the at least one object in the two or more frames, movement of the at least one object in the two or more frames, or both the position and the movement of the at least one object in the two or more frames, and wherein generating the sonification audio data comprises: 
mapping one or more audio parameter values included in a look-up table to one or more aspects of data associated with the at least one object, wherein the one or more aspects include at least one of the position and the movement of the at least one object; 
combining the audio data and the sonification audio data; and 
providing input based on the combined audio data and sonification audio data to a trained event classifier to detect an event represented in the multimedia data.
Obara (US 2020/0053401), Altuev (US 2018/0098034), and Kruglick (US 2015/0350716) are the closest prior art relating to the Applicant's claimed invention. 
Obara discloses methods and systems for providing a video stream along with a slow motion video showing a particular event depicted in the video stream are described herein. The method includes generating a first video stream and generating a second video stream, which is a slow motion video stream, from the first video stream by modifying a playback speed of the first video stream. The method includes monitoring content of the first video stream to identify an event trigger of a predefined set of event triggers. Each event trigger indicates a presence in the first video stream of an event that is to be generated for display using the second video stream. The method includes determining, based on the identifying of the event trigger, to transmit the second video stream along with the first video stream, and simultaneously transmitting both the first video stream and the second video stream.
Altuev discloses a method for exchanging data between an IP video camera using an embedded video analytics and an external server comprises generating at least one video frame by said IP video camera; converting the video frame to a digital form by said IP video camera; processing the converted video frame via an IP processor; video cameras, using computer vision techniques, then creating metadata, transferring the received metadata to an external server for further use. The generated metadata is stored in the camera's IP storage, and then the stored metadata is read by the server. The metadata is stored in the DBMS of the IP video camera, the search query to the DBMS is received from the external server, the search query from the external server is processed in the DBMS, and the search results are transferred from the DBMS to the external server.
Kurglick discloses technologies are generally described for a system to process a collection of video recordings of a scene to extract and localize audio sources for the audio data. According to some examples, video recordings captured by mobile devices from different perspectives may be uploaded to a central database. Video segments capturing an overlapping portion of the scene at an overlapping time may be identified, and a relative location of each of the video capturing devices may be determined. Audio data for the video segments may be indexed with a sub-frame time reference and relative locations as a function of overlapping time. Using the indices that include the sub-frame time references and relative locations, audio sources for the audio data may be extracted and localized. The extracted audio sources may be transcribed and indexed to enable searching, and may be added back to each video recording as a separate audio channel.
The prior art do not disclose or render obvious the amended features.

With respect to claim 20, the prior art of record fails to disclose singly or incombination or render obvious a computer program product for event detection, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by one or more processors to perform operations including: 
obtaining multimedia data including multiple frames of video data and audio data corresponding to the video data; processing the multiple frames of video data to detect at least one object depicted in the video data and to track the at least one object between two or more frames of the multiple frames; 
based on the video data, generating sonification audio data using a sonification process, wherein the sonification audio data is a digital representation of a sound that represents a position of the at least one object in the two or more frames, movement of the at least one object in the two or more frames, or both the position and the movement of the at least one object in the two or more frames, and wherein generating the sonification audio data comprises: 
mapping one or more audio parameter values included in a look-up table to one or more aspects of data associated with the at least one object, wherein the one or more aspects include at least one of the position and the movement of the at least one object; 
combining the audio data and the sonification audio data; and providing input based on the combined audio data and sonification audio data to a trained event classifier to detect an event represented in the multimedia data.
Obara (US 2020/0053401), Altuev (US 2018/0098034), and Kruglick (US 2015/0350716) are the closest prior art relating to the Applicant's claimed invention. 
Obara discloses methods and systems for providing a video stream along with a slow motion video showing a particular event depicted in the video stream are described herein. The method includes generating a first video stream and generating a second video stream, which is a slow motion video stream, from the first video stream by modifying a playback speed of the first video stream. The method includes monitoring content of the first video stream to identify an event trigger of a predefined set of event triggers. Each event trigger indicates a presence in the first video stream of an event that is to be generated for display using the second video stream. The method includes determining, based on the identifying of the event trigger, to transmit the second video stream along with the first video stream, and simultaneously transmitting both the first video stream and the second video stream.
Altuev discloses a method for exchanging data between an IP video camera using an embedded video analytics and an external server comprises generating at least one video frame by said IP video camera; converting the video frame to a digital form by said IP video camera; processing the converted video frame via an IP processor; video cameras, using computer vision techniques, then creating metadata, transferring the received metadata to an external server for further use. The generated metadata is stored in the camera's IP storage, and then the stored metadata is read by the server. The metadata is stored in the DBMS of the IP video camera, the search query to the DBMS is received from the external server, the search query from the external server is processed in the DBMS, and the search results are transferred from the DBMS to the external server.
Kurglick discloses technologies are generally described for a system to process a collection of video recordings of a scene to extract and localize audio sources for the audio data. According to some examples, video recordings captured by mobile devices from different perspectives may be uploaded to a central database. Video segments capturing an overlapping portion of the scene at an overlapping time may be identified, and a relative location of each of the video capturing devices may be determined. Audio data for the video segments may be indexed with a sub-frame time reference and relative locations as a function of overlapping time. Using the indices that include the sub-frame time references and relative locations, audio sources for the audio data may be extracted and localized. The extracted audio sources may be transcribed and indexed to enable searching, and may be added back to each video recording as a separate audio channel.
The prior art do not disclose or render obvious the amended features.

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 9548048 B1		Solh; Mashhour et al.
US 20140101238 A1	Soon-Shiong; Patrick
US 20130253834 A1	Slusar; Mark

Inquiries
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUSHFIKH I ALAM whose telephone number is (571)270-1710. The examiner can normally be reached 1:00PM-9:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MUSHFIKH I. ALAM
Primary Examiner
Art Unit 2426



/MUSHFIKH I ALAM/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        3/8/2022