DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/11/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 6, 8, 10 and 13-15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by KUO et al. (US 20200012883 A1, hereinafter “KUO”).

Regarding claim 1. KUO discloses a monitoring system (the surveillance system 100) for locating and classifying an event in a monitoring area by a computation system (0006; Figures 1, 6 and 10), the monitoring system comprising: 
a visual three-dimensional (3D) capturing unit (cameras 110A), configured to capture and provide a geometric 3D information of the monitoring area (0032, 0042 and 0043; “[0043] … The computation apparatus 120 may calculate rough features such as the directions and distances of the three persons using the three-dimensional camera/depth camera, or triangular positioning techniques of directional microphones, thereby roughly defining the space in which the objects are located and the motion vectors of the objects. …”); 
an acoustic capturing unit (110B) with a microphone array and configured to derive and provide an acoustic information of the monitoring area (0042 and 0107-0109; Figure 6B);
an event detector (120) comprising an acoustic channel (noticeable acoustic object (AO) and  LODAC process (block 902-2)) and a visual channel (noticeable video object (VO) and  LODAC process (block 902-1)) to detect the event and to determine a localization of the event (0080, 0184-0187; Figures 4 and 9; “[0080] In step S404, local-object detection (LOD) is performed to determine whether there is any noticeable acoustic object (AO) in each piece of acoustic data. In step 406, it is determined whether a noticeable acoustic object has been found. If there is a noticeable acoustic object has been found, step S408 is performed to record the corresponding acoustic object. If there is no noticeable acoustic object has been found, step S402 is performed. “, “[0184] … the computation apparatus 120 performs a local-object-detection-and-correspondence (LODAC) process which includes a local-object-detection process and a local-object-correspondence process. … For example, in block 902-1, video data captured by one or more cameras 110A (e.g., cameras 110A-1˜110A-4) is received. In block 902-2, acoustic data captured by one or more microphones (e.g., microphones 110B-1˜110B-3) is received. …”), wherein 
the acoustic channel is provided with the acoustic information and is configured to detect the event as a sound event in the acoustic information and to determine a localization of the sound event in the monitoring area based on the acoustic information (0080-0081, 0083-0084 0109; Figure 4A;  S410), or 
the visual channel is provided with the geometric 3D information and is configured to detect the event as a visual event in the geometric 3D information and to derive a localization of the visual event in the monitoring area based on the geometric 3D information (0067-0070; object recognition, motion trajectory of objects, distribution of objects), 
wherein the event detector (120) is configured to provide detected events with a region of interest (0116-0121; Figures 7A-B; “ROI”), comprising the localization (world-positioning-coordinate of VO and world-positioning-coordinate of AO) and a time information of the detected event (0070-0071 and 0084-0085; Figures 3A and 4A; “[0071] In step S312, all detected video objects are merged and analyzed. For example, one or more video objects may be detected from each piece of video data, and each video object may have its own world-positioning-coordinate information and time stamps. Accordingly, the computation apparatus 120 may determine whether the video objects in different video data are correlated according to the world-positioning-coordinate information and time stamps of each video object” and “[0085] In step S412, all detected acoustic objects are merged and analyzed. For example, one or more acoustic objects may be detected from each piece of acoustic data, and each acoustic object may have its own world-positioning-coordinate information and time stamps. Accordingly, the computation apparatus 120 may determine whether the acoustic objects in different acoustic data are correlated according to the world-positioning-coordinate information and time stamps of each acoustic object.”); and 
a classifier (120) provided with the geometric 3D information (world-positioning-coordinate), the acoustic information (AO), and the region of interest (ROI), and configured to analyze the region of interest (ROI) by processing the acoustic information and geometric 3D information within the region of interest in order to assign the detected event a class (determining global-object ID (GOID)) within a plurality of event classes (Figures 3A-3B and Figures 4A-4B; 0069-0082; wherein the plurality of classes are in both the acoustic audio information as well as in the visual 3D information for every detected events; see also, 0127; Figure 8A; list of detected global-identity-ID (GIID “[0127] In step S810, the global-fusion feature of each global object is input to a global-object-recognition (GOR) model to perform identity recognition of each global object, thereby generating a global-identity-ID (GIID) list. For example, the computation apparatus 120 may record the GIID in the GIID list. In addition, the GIID list further records the global-recognition result and its confidence level.”).


Regarding claim 2. KUO discloses the monitoring system according to claim 1, wherein the classifier is configured to classify both, the acoustic information within the region of interest (Figures 4A-4B; 0078-0082) as well as the visual information within the region of interest (Figures 3A-3B; 0069-0077) individually (KOU describes individually classifying the acoustic information and the visual information taking into account respective localization and time; see also, Figure 9A-1 and 0184 ff).


Regarding claim 3. KUO discloses the monitoring system according to claim 1, wherein the classifier is configured to conjointly classify the acoustic information and the geometric 3D information within the region of interest in a multimodal classifier (Figures 9B-1, 9B-2 and 2; 0201 and 0049; sensor fusion calculation, “[0201] In addition, the context-retrieving process performed in each of blocks 904-1A, 904-1B, and 904-1C, for example, may refer to the recognized video object and obtain the context region and predicted ROI in the corresponding video frame. In block 904-1E, the computation apparatus 120 may perform a local-context-fusion process and ROI-fusion process to fuse the context regions and predicted ROIs respectively from blocks 904-1A, 904-1B, and 904-1C to obtain a fused-context region and a fused ROI. Blocks 904-2 (e.g., for acoustic data Audio1, Audio2, and Audio3) and 904-N (e.g., for smell data Smell1, Smell2, and Smell3) in FIGS. 9B-1 and 9B-2 may perform similar processes on the sensor data in the corresponding type to obtain the fused-context region and fused ROI in the corresponding type.”, “[0049] FIG. 2 is a schematic block diagram of a monitoring program 130 in accordance with an embodiment of the disclosure. The monitoring program 130, for example, includes a local-object-recognition module 131, a feature-fusion module 132, and a global-recognition module 133. The local-object-recognition module 131 is configured to perform a local-object process on each type of sensor data to generate local-object-feature information for each type.”).


Regarding claim 6. KUO discloses the monitoring system according to claim 1, wherein the localization of the sound event is derived with an acoustic localization in at least a direction, by an evaluation of the acoustic information of the sound event (0081; “[0081] For example, the computation apparatus 120 may detect whether there is any noticeable acoustic object in a temporal exploration region within an acoustic segment in each piece of acoustic data. In some embodiments, the computation apparatus 120 may detect particular object sounds or event sounds from each piece of acoustic data, such as sounds of gunshots, explosions, crying, noises, or percussion, but the disclosure is not limited thereto. That is, the computation apparatus 120 may determine that the above-mentioned aforementioned particular object sounds or event sounds are unusual sounds in the environment, and thus these sounds belong to the noticeable acoustic objects. For example, conventional speech-signal processing techniques such as “Mel-frequency cepstrum coefficient (MFCC)” method, or “linear prediction cepstrum coefficient (LPCC)” method can be used to perform feature extraction on the aforementioned unusual sounds.”).


Regarding claim 8. The monitoring system according to claim 1, wherein the acoustic information is provided to the classifier with a correcting of an influence of at least part of a 3D geometry of the monitoring area to acoustic information, which 3D geometry is derived from the geometric 3D information (0198; Figure 9A; “[0198] Accordingly, each of the local-object-recognition models in blocks 906-1˜906-N may be adjusted or updated according to the recognition result and its confidence level in the corresponding type (e.g., from block 920 through selector 922), and the local-detail features in the corresponding type (e.g., from block 926), so that a more accurate recognition result of the local-object recognition can be obtained.”).


Regarding claim 10. KUO discloses the monitoring system according to claim 1, wherein the region of interest is derived with a direction information from the localization of the sound event combined with a corresponding distance measurement in this direction from the geometric 3D information (0132-0133 and 0121; Figure 8; “[0133] With regard to the video-detail features, the color feature may include the difference values in density, saturation, and brightness. The texture feature may include the difference values in patterns. The shape feature may include the difference values in lines, relative positions, relative distances, and relative directions. With regard to the acoustic-detail features, the sound-volume feature may include the difference value in acoustic energy. The pitch feature may include the difference value in the acoustic frequency. The tone feature may include the difference values in proportions of harmonics or overtone. Since the property of each local-detail feature is different from each other, the difference value of each local-detail feature should be normalized to evaluate the normalized difference value of each local-detail feature. The normalized difference value indicates the relative importance of each selected LFF in the overall rating, and can be expressed by a natural number which may be a positive integer, a negative integer, or zero”).


Regarding claim 13. Method claim 13 is drawn to the method of using the corresponding system claimed in claim 1. Therefore method claim 13 corresponds to system claim 1 is rejected for same reasons of anticipation as used above.


Regarding claim 14. Non-transitory computer-readable storage medium claim 14 is drawn to the non-transitory computer-readable storage medium of using the corresponding to the method of using the same as claimed in claim 13. Therefore, non-transitory computer-readable storage medium claim 14 corresponds to the method claim 13, and is rejected for the same reasons of anticipation as used above.


Regarding claim 15. Independent claim 15 have limitations similar to those independent claims 1, 13 and 14 treated above. Claim 15 however, also recites a building or facility surveillance device, the device being installed stationarily at a surveillance-site to establish a monitoring system. KUO further discloses the surveillance device being installed stationarily at a surveillance-site to establish a monitoring system (0101, 0108 and 0218-0219; Figures 6 10; “[0219] … As depicted in FIG. 10, scene 1000, for example, is an area near a bank gate, wherein a camera 1010 and a directional microphone 1020 are installed on the bank gate to monitor the entry and exit of the bank gate 1001, and this area is defined as, for example, the first area 1031. After entering the bank through the bank gate 1001, there is a customer-waiting area in which a sofa 1002 is placed. A camera 1011 and a directional microphone 1021 are installed in the customer-waiting area for monitoring the range from the bank gate 1001 to the bank lobby and the customer-waiting area, and the monitoring range is defined as the second area 1032. There is an overlapping area between the first area 1031 and the second area 1032, and the overlapping area is defined as the third area 1033. The cameras 1011˜1011 and microphones 1021˜1021 are capable of monitoring the third area 1033.”).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 4, 5 and 12 is rejected under 35 U.S.C. 103 as being unpatentable over KUO as applied to Claim 1 above, and in view of Noland (US 20180158305 A1, hereinafter “Noland”).

Regarding claim 4. KUO discloses the monitoring system according to claim 1, but failed to disclose wherein upon the event being detected, the classifier is configured to analyze the acoustic information with an applying of a numerical acoustic beamforming towards the localization of the detected event and within a limited time-interval around the detected event.
Noland, however, in the same field of endeavor, shows the monitoring system, wherein upon the event being detected, the classifier is configured to analyze the acoustic information with an applying of a numerical acoustic beamforming towards the localization of the detected event and within a limited time-interval around the detected event (0132 and 0170; Figure 3; “[0132] The acoustic sensor(s) 20 may comprise MEMS microphones, which have an omnidirectional pickup responsive equally to sounds coming from any direction. In other aspects, the acoustic sensor(s) 20 comprise multiple microphones disposed in an array to form a directional response, or a beam pattern. In yet other aspects, the acoustic sensor(s) 20 comprise a beamforming microphone array can be designed to be more sensitive to sound coming from one or more specific directions than sound coming from other directions and can employ beamforming techniques such as, but not limited to, conventional (fixed or switched beam) beamforming, adaptive beamforming phased array, desired signal maximization mode, and interference signal minimization or cancellation mode.”).
Therefore, it would have been obvious to the person of having ordinary skilled in the art before the effective filing date of the invention to modify KUO with Noland’s teaching by configuring the classifier to analyze the acoustic information with applying of a numerical acoustic beamforming towards the localization of the detected event and within a limited time-interval around the detected event (please see, Noland: [0132]).


Regarding claim 5. KUO discloses the monitoring system according to claim 1, but failed to disclose wherein the visual 3D capturing unit is configured with a laser range finder with a pivotable measurement direction, and is configured to derive a point cloud of the monitoring area.
Noland, however, in the same field of endeavor, shows the monitoring system, wherein the visual 3D capturing unit is configured with a laser range finder with a pivotable measurement direction, and is configured to derive a point cloud of the monitoring area (0123; “[0123] Yet another countermeasure that can be integrated (e.g., physically or operatively) with the threat sensing system 10 is the Dazzler, a light-based weapon intended to temporarily blind or disorient a target with intense directed radiation (e.g., visible light output by laser diodes or diode-pumped solid-state lasers), but without causing of any long-term damage to the eyes. Again, as with the incapacitator, the Dazzler countermeasure need not be physically integrated with the threat sensing device 10, and may be separately disposed in any convenient location, with communication between the threat sensing device and countermeasure occurring via a conventional wireless (e.g., wi-fi, Bluetooth, etc.) or hardwired connection. Other possible countermeasures could include directed energy weapons (e.g., lasers, directional acoustic weapons), such as the SaberShot laser dazzler, outputting 250 Mw of 532 nm green-laser or a highly directional, high power speaker configured to produce sounds in a narrow beam at a debilitating 150 dB.”).
Therefore, it would have been obvious to the person of having ordinary skilled in the art before the effective filing date of the invention to modify KUO with Noland’s teaching by configuring the visual 3D capturing unit with a laser range finder with a pivotable measurement direction, and is configured to derive a point cloud of the monitoring area and yields predictable result (please see, Noland: [0123]).


Regarding claim 12. Claim 12 have limitations similar to those treated above in the rejection to the claim 4 and have been rejected for the same reasons of obviousness as used.



Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over KUO as applied to Claim 1 above, and in view of Shrestha (US 11195067 B2, hereinafter “Shrestha”).

Regarding claim 9. KUO discloses the monitoring system according to claim 1, wherein the classifier is embodied with an at least semi-supervised deep learning algorithm trained on a set of training data which is at least partially artificially generated based on digital models.
Shrestha, however in the same field of endeavor, shows monitoring system, wherein the classifier is embodied with an at least semi-supervised deep learning algorithm trained on a set of training data which is at least partially artificially generated based on digital models (Column 6 lines 57-65; “In some implementations, the comprehension system 120 performs any suitable machine learning process, including one or more of: supervised learning (e.g., using logistic regression, back propagation neural networks, random forests, decision trees, etc.), unsupervised learning (e.g., using an Apriori algorithm, k-means clustering, etc.), semi-supervised learning, reinforcement learning (e.g., using a Q-learning algorithm, temporal difference learning, etc.), and any other suitable learning style.”).
Therefore, it would have been obvious to the person of having ordinary skilled in the art before the effective filing date of the invention to modify KUO with Shrestha’s teaching incorporating the at least semi-supervised deep learning algorithm trained on a set of training data which is at least partially artificially generated based on digital models in to the classifier in order to have a neural network that can generate an artificially without supervision or partly supervised data (please see, Shrestha: column 4 lines 55-60).



Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over KUO as applied to Claim 1 above, and in view of Scalisi (US 20200358908 A1, hereinafter “Scalisi”).

Regarding claim 11. KUO discloses the monitoring system according to claim 1, but failed to disclose wherein: 
the visual 3D capturing unit has a standby mode and an alert mode, wherein in the standby mode a rate of capturing the geometric 3D information is lower than in the alert mode, and 
in the acoustic channel, the acoustic information is continuously provided to the event detector to detect sound events, and upon a detection of the sound event, the visual 3D capturing unit is set into the alert mode.
Scalisi, however, in the same field of endeavor, shows the monitoring system, wherein: 
the visual 3D capturing unit has a standby mode and an alert mode, wherein in the standby mode a rate of capturing the geometric 3D information is lower than in the alert mode (0229 and 0231; Figure 17; “[0229] Step 700 can include entering a Sleep Mode. In some embodiments, Sleep Mode has lower power consumption than Standby Mode and/or Alert Mode. In several embodiments, Sleep Mode turns off, powers down, and/or reduces the activity of one or more components and/or assemblies. In some embodiments, the camera is off, not recording, and/or in Low Power Mode while the system is in Sleep Mode. In some embodiments, the speaker is off, not recording, and/or in Low Power Mode while the system is in Sleep Mode. In several embodiments, the microphone is off, not recording, and/or in Low Power Mode while the system is in Sleep Mode.”, “[0231] In some embodiments, thresholds necessary to exit the Sleep Mode and enter a Standby Mode are less than thresholds necessary to exit the Standby Mode and enter an Alert Mode. In several embodiments, greater motion, closer proximity, and/or louder noise are necessary to enter an Alert Mode than are necessary to exit the Sleep Mode and enter a Standby Mode. In some embodiments, button contact is necessary to enter an Alert Mode. In some embodiments, a system will exit the Sleep Mode and enter a Standby Mode upon detecting motion, detecting motion within 10 feet, or detecting motion within 20 feet. In some embodiments, a system will exit the Sleep Mode and enter a Standby Mode upon detecting sound, upon detecting sound louder than 10 decibels, upon detecting sound louder than 25 decibels, upon detecting sound louder than 50 decibels, upon detecting sound louder than 80 decibels, or upon detecting sound louder than 90 decibels.”), and 
in the acoustic channel, the acoustic information is continuously provided to the event detector to detect sound events, and upon a detection of the sound event, the visual 3D capturing unit is set into the alert mode (0232 and 0234; Figure 17; “[0232] In several embodiments, Standby Mode turns on, powers up, and/or increases the activity (e.g., electrical activity, detection activity, detecting) of one or more components and/or assemblies (relative to Sleep Mode). In some embodiments, the camera is on, recording, and/or in an Intermediate Power Mode while the system is in Standby Mode. In some embodiments of Standby Mode, the camera is configured to quickly start recording, but is not recording. In several embodiments of Standby Mode, the microphone is on, in Detection Mode, and/or detecting sounds to help the system determine if it should change to Alert Mode.”, and “[0234] In some embodiments of Alert Mode, the system has determined that a visitor is present and/or attempting to contact a person in the building (e.g., the visitor is ringing a doorbell, waiting by the doorbell, knocking on a door). Some embodiments go into Alert Mode even if the visitor is not trying to contact a person in the building (e.g., the visitor could be a person trying to break into the building). The system can be configured to enter Alert Mode if the system detects a visitor within 20 feet, within 10 feet, or within five feet. The system can be configured to enter Alert Mode if the system detects a sound greater than 50 decibels, 80 decibels, and/or 90 decibels. The system can be configured to enter Alert Mode if a visitor presses a doorbell button and/or triggers a proximity sensor.”).
Therefore, it would have been obvious to the person of having ordinary skilled in the art before the effective filing date of the invention to modify KUO with Scalisi’s teaching placing the camera in standby or sleep mode until a triggering event is detected by the acoustic sensor in order to increase the conservation of power and reduce power consumption (please see, Scalisi: [0229] and Figure 17).


Allowable Subject Matter
Claim 7 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASMAMAW G TARKO whose telephone number is (571)272-9205. The examiner can normally be reached Monday -Friday 9:00 Am - 5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chris Kelley can be reached on (571) 272-7331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


ASMAMAW G. TARKO
Examiner, Art Unit 2482



/CHRISTOPHER S KELLEY/Supervisory Patent Examiner, Art Unit 2482