Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's amendments filed on 11/10/2021 overcome the following set forth in the previous Office Action:
The claim 7 being objected.
The claims 3-9, 12-18 and 20 being rejected under 35 USC §112 (b) or 35 USC §112 (pre-AIA ), second paragraph.
The claims 1-5, 7-14 and 16-20 being rejected under 35 USC §102  or 35 USC §103.
Applicant's arguments filed 11/10/2021 have been fully considered but they are not persuasive. The Office has thoroughly reviewed Applicants' arguments which are moot in view of the new ground(s) of rejection necessitated by the filed amendments. Since all arguments are for the claimed limitations as amended not as originally filed, the responses to the arguments will be detailed in the rejection section below.
Claim Objections
Claim 16 is objected to because of the following informalities:  
Claim 16 (line 8) recites “from in a clip of the video”, where “from” and “in” are redundant and thus “from in” should be replaced with either “from” or “in” to be grammatically correct.
Appropriate correction is required.

References Cited in Prior Art Rejections 
The following references are cited in the prior art rejections set forth below and are referred to as noted:
Visser et al., US 20130272548 A1, published on October 17, 2013, hereinafter Visser.  
QinetiQ’s EARS SWATS AKA IGDS: Shoulder mounted shot detection, https://www.thefirearmblog.com/blog/2017/08/10/qinetiqs-ears-swats-aka-igds-shoulder-mounted-shot-detection/, published on August 10, 2017, hereinafter QinetiQ.  
Weiss et al., US 20150249904 A1, published on September 3, 2015, hereinafter Weiss.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-3, 7-12 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Visser in view of QinetiQ.
Regarding claim 1, Visser discloses a computer-implemented method (Figs. 1-6) comprising: 
obtaining, by one or more processors, image data representing a scene and sound distribution information related to the scene; (Visser: 52 in Fig. 2, 202 in Fig. 3, and processors in Fig. 6)
determining, by one or more processors, a detection strategy to be applied in object detection based on the sound distribution information; (Visser: 64 and 66 in Fig. 2) 
wherein the detection strategy comprises focusing on regions of interest, (Visser: 64-68 in Fig. 5 and 524-532 in Fig. 6 and [0073-0074, 0094-0096, 0099]. “The ROI keypoint selector 524 receives coordinate information from the ROI selector 514 identifying the ROI in the captured image. Based on the coordinate information, the ROI keypoint select narrows down the image keypoint selection to those stable keypoints located within the ROI.” [0094]) and wherein determining the detection strategy (Visser: 64 and 66 in Fig. 2) comprises: 
determining whether a first element in the sound distribution information has a predefined semantic meaning; (Visser: 62 in Fig. 2

performing, by one or more processors, the object detection on the image data by applying the detection strategy. (Visser: 68-70 in Fig. 2)
Visser does not disclose explicitly	upon determining that the first element in the sound distribution information has the predefined semantic meaning, identifying a corresponding region of a digital image of the scene as a region of interest. 
However, QinetiQ teaches, in the same field of endeavor of object or event detection based on sound distribution information, 
determining whether a first element in the sound distribution information has a predefined semantic meaning; (QinetiQ: 1st paragraph on page 1. The claimed “predefined semantic meaning” is interpreted as the disclosed “gunshot”. “Their EARS system has been operational for a while as a vehicle and building mounted gunshot detection system.”) and 
upon determining that the first element in the sound distribution information has the predefined semantic meaning, identifying a corresponding region of a digital image of the scene as a region of interest. (QinetiQ: 2nd paragraph on page 1. “The remote has a visual display of the distance and angle, and the unit also gives off an audio cue of where the shot/s were detected coming from.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Visser’s disclosure with QinetiQ’s teachings 
Therefore, it would have been obvious to combine Visser with QinetiQ to obtain the invention as specified in claim 1. 
Regarding claim 2, Visser {modified by QinetiQ} discloses the method of claim 1, wherein obtaining the sound distribution information comprises: 
obtaining, by one or more processors, at least one sound signal from at least one sound collecting device deployed in the scene; (18 in Figs. 1 and 6) and 
generating, by one or more processors, the sound distribution information from the at least one sound signal. (502 in Fig. 6)
Regarding claim 3, Visser {modified by QinetiQ} discloses the method of claim 1, wherein the image data comprises the digital image of the scene, (202 in Fig. 3. “In step 202, the audio and visual information (still image and/or video) are recorded by the system 12.” [0077]) and determining the detection strategy comprises: 
identifying, by one or more processors, the region of interest in the digital image based on the sound distribution information; (406 in Fig. 5A and 514 in Fig. 6. “In step 406, a region of interest (ROI) in the image of the scene is selected, based on audio recorded from the scene.” [0079]. “From the microphone signals received from the microphone array 18, the DOA detector 512 determines the direction of arrival of sound emanating from a sound source located within the scene. … The ROI selector 514 estimates the location of the sound source based on the DOA information and known position of microphone array 18.” [0090]) and 
determining to focus on the region of interest. (64-68 in Fig. 5 and 524-532 in Fig. 6 and [0073-0074, 0094-0096, 0099]. “The ROI keypoint selector 524 receives coordinate information from the ROI selector 514 identifying the ROI in the captured image. Based on the coordinate information, the ROI keypoint select narrows down the image keypoint selection to those stable keypoints located within the ROI.” [0094])
Regarding claim 7, Visser {modified by QinetiQ} discloses the method of claim 3, wherein focusing on the region of interest comprises at least one of the following: 
determining, by one or more processors, at least one parameter value to be applied in detecting a target object from the region of interest within the 64-68 in Fig. 5 and 524-532 in Fig. 6 and [0073-0074, 0094-0096, 0099]. “The ROI keypoint selector 524 receives coordinate information from the ROI selector 514 identifying the ROI in the captured image. Based on the coordinate information, the ROI keypoint select narrows down the image keypoint selection to those stable keypoints located within the ROI.” [0094]) or 
in accordance with a determination that the image data comprises a video, determining, by one or more processors, a frame rate for the video for sampling frames adjacent to the digital image from a clip of the video to perform the object detection, the frame rate being higher than a frame rate for sampling frames in a further clip of the video. (502 in Fig. 6)
Regarding claim 8, Visser {modified by QinetiQ} discloses the method of claim 7, wherein determining the at least one parameter value comprises at least one of the following: 
determining, by one or more processors, a first parameter value to be applied for selecting candidate blocks for detecting the target object, such that more candidate blocks are selected from the region of interest than from a remaining region in the digital image; (64-68 in Fig. 5 and 524-532 in Fig. 6 and [0073-0074, 0094-0096, 0099]. “The ROI keypoint selector 524 receives coordinate information from the ROI selector 514 identifying the ROI in the captured image. Based on the coordinate information, the ROI keypoint select narrows down the image keypoint selection to those stable keypoints located within the ROI.” [0094])
determining, by one or more processors, a second parameter value to be applied for scaling of the digital image in the object detection, such that more scaling levels are to be applied for scaling the digital image than for scaling a further digital image without the region of interest; or 
determining, by one or more processors, a third parameter value to be applied for scaling of the region of interest, such that more scaling levels are to be applied for scaling the region of interest than for scaling a remaining region in the digital image.
Regarding claim 9, Visser {modified by QinetiQ} discloses the method of claim 8, wherein the first parameter value comprises at least one of a size of a sliding window, a step size for moving a sliding window, a scoring criterion, and a size of a bounding box for selective search. (64-68 in Fig. 5 and 524-532 in Fig. 6 and [0073-0074, 0094-0096, 0099]. “The ROI keypoint selector 524 receives coordinate information from the ROI selector 514 identifying the ROI in the captured image. Based on the coordinate information, the ROI keypoint select narrows down the image keypoint selection to those stable keypoints located within the ROI.” [0094]. The claimed “size of a bounding box for selective search” is implied by the coordinate information of the ROI.)
Claims 10-12 and 16-18 are the apparatus (Visser: Fig. 6; [00292]) claims, respectively, corresponding to the method claims 1-3 and 7-9. Therefore, since claims 10-12 and 16-18 are similar in scope to claims 1-3 and 7-9, claims 10-12 and 16-18 are rejected on the same grounds as claims 1-3 and 7-9.
Claims 19-20 are the computer program product (Visser: [00292]) claims, respectively, corresponding to the method claims 1 and 3. Therefore, since claims 19-.

Claims 4-5 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Visser {modified by QinetiQ} as applied to claims 3 and 12 discussed above, and further in view of Weiss.
Regarding claim 4, Visser {modified by QinetiQ} discloses the method of claim 3, wherein identifying the region of interest comprises: 
determining, by one or more processors, whether an expected sound is produced in the scene based on the sound distribution information; (Visser: 406 in Fig. 5A and 514 in Fig. 6. “In step 406, a region of interest (ROI) in the image of the scene is selected, based on audio recorded from the scene.” [0079]. “From the microphone signals received from the microphone array 18, the DOA detector 512 determines the direction of arrival of sound emanating from a sound source located within the scene. … The ROI selector 514 estimates the location of the sound source based on the DOA information and known position of microphone array 18.” [0090])and 
in accordance with the expected sound being determined, identifying, by one or more processors, a region of the digital image representing Visser: 406 in Fig. 5A and 514 in Fig. 6. “From the microphone signals received from the microphone array 18, the DOA detector 512 determines the direction of arrival of sound emanating from a sound source located within the scene. … The ROI selector 514 estimates the location of the sound source based on the DOA information and known position of microphone array 18.” [0090])
Visser {modified by QinetiQ} does not disclose explicitly but Weiss teaches, in the same field of endeavor of object or event detection based on sound distribution information, identifying a region of the digital image representing a geographic area of the scene where the expected sound is produced as the region of interest. (Weiss: [0031, 0038-0039, 0044].)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Visser’s {modified by QinetiQ} disclosure with Weiss’s teachings by combining the method for determining a region of interest based on sound distribution information (from Visser {modified by QinetiQ}) with the technique of identifying a region of interest in a geographic area of the scene based on sound distribution information (from Weiss) to yield no more than predictable use of prior art elements according to their established functions since all the claimed elements, which are taught by prior art references, would continue to operate in the same manner, particularly, the method for determining a region of interest based on sound distribution information would still work in the way according to Visser {modified by QinetiQ} and the technique of identifying a region of interest in a geographic area of the scene based on sound distribution information would continue to function as taught by Weiss. In fact, Weiss's technique of identifying a region of interest in a geographic area of the scene based on sound distribution information would provide a practical and/or alternative implementation of the method for determining a region of interest 
Therefore, it would have been obvious to combine Visser {modified by QinetiQ} with Weiss to obtain the invention as specified in claim 4. 
Regarding claim 5, Visser {modified by QinetiQ and Weiss} discloses the method of claim 4, wherein the sound distribution information comprises a first heat map of the scene with elements indicating sound energy levels distributed across the scene, and wherein determining whether the expected sound is produced comprises: determining, by one or more processors, whether the first heat map comprises at least one second element indicating a sound energy level higher than a threshold level; and in accordance with a determination that the first heat map comprises the at least one second element, determining, by one or more processors, that the expected sound is produced in the scene. (Weiss: Figs. 5A-5C and [0031, 0038-0039, 0044] and claims 29 and 34. “FIGS. 5A, 5B, and 5C show the map area 60 displayed in heat maps 70, 72, 74 corresponding to a particular day on a display 25 of one of the mobile devices 12. Alternatively, the heat maps 70, 72, 74 can be displayed by another device 16, mobile or wired, not running the monitoring agent 13. The heat maps 70, 72, 74 are enabled on the display 25 by the user application 22. The heat maps 70, 72, 74 are generated based on sensor measurements including sound measurements which are run through a classifier in the analytics engine 30 to estimate sound levels in areas bounded by dark lines within the heat maps 70,72,74. Dimensionless sound levels represented by "1" for relatively low sound levels, "2" for relatively medium sound levels, and "3" for relatively high sound levels are displayed within the bounded areas to show relative sound levels during indicated time periods on the particular day.” [0038])
Claims 13-14 are the apparatus (Visser: Fig. 6; [00292]) claims, respectively, corresponding to the method claims 4-5. Therefore, since claims 13-14 are similar in scope to claims 4-5, claims 13-14 are rejected on the same grounds as claims 4-5.
Allowable Subject Matter
While claims 6 and 15 are objected to as being dependent upon rejected base claims, respectively, but would be allowable over prior art references cited if rewritten in independent form including all of the limitations of the respective base claims and any intervening claims.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FENG NIU/Primary Examiner, Art Unit 2669