DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Specifically, claim 20 recites means for receiving, means for automatically determining, means for selectively modifying, and means for displaying.
Upon reviewing the specification of current application, Examiner finds the corresponding structure of:
means for receiving as a memory via a processor and a bus (Fig. 3),
means for automatically determining as a processor executing corresponding code in the memory (Fig. 3),
means for selectively modifying as a processor executing corresponding code in the memory (Fig. 3), and 
means for displaying as a display with a speaker ([0045] and [0048]).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-4, 11-14, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Long et al. (US 2019/0138748 A1 – hereinafter Long).
Regarding claim 1, Long discloses a system, comprising: a video camera (Fig. 1 – video camera 102 capturing video images as further described at least at [0058]) comprising: an image sensor configured to capture image data for a video stream (Fig. 1; [0023] – image sensor 108 capturing image data for a video stream as further described at least at [0058]); and a microphone configured to capture audio data for the video stream (Fig. 1; [0024] – microphone 110 capturing audio data for a video stream as further described at least at [0058]); and a controller configured to: receive the video stream from the video camera (Fig. 1; [0026] – edge computing device 114 configured to receive the video stream from the camera to process the received data including video images and sounds, to remove personally identifiable data); determine a human speaking condition from the video stream ([0063] – recognizing voice in the audio, thus determining a human speaking condition from the video stream as there exist recognized voice in the video stream); selectively modify, responsive to determining the human speaking condition, the audio data in the video stream during the human speaking condition ([0063]-[0064] – redacting the recognized voice in the audio data in the video stream); and store the video stream with the modified audio data ([0067] – transmitting the modified video stream to a remote storage device for storage).
Regarding claim 2, Long also discloses selectively modifying the audio data in the video stream during the human speaking condition includes processing the audio data to mask a human voice component of the audio data ([0063]-[0064] – redacting the recognized voice in the audio data in the video stream).
Regarding claim 3, Long also discloses the controller is further configured to determine the human voice component of the audio data during the human speaking condition ([0063]-[0064] – determine the recognized human voice component for redacting the recognized component in the audio data in the video stream).
Regarding claim 4, Long also discloses processing the audio data to mask the human voice component of the audio data includes: separating the human voice component of the audio data from a non-voice remainder component of the audio data ([0063]-[0064] – recognizing parts comprising voice in the audio data in the video stream); selectively masking the human voice component of the audio data ([0063]-[0064] – redacting the recognized voice in the audio data in the video stream); and leaving the non-voice remainder component of the audio data unchanged ([0063]-[0064] – redacting the recognized voice in the audio data in the video stream thus leaving the non-voice component unchanged).
Claim 11 is rejected for the same reason as discussed in claim 1 above.
Claim 12 is rejected for the same reason as discussed in claim 2 above.
Claim 13 is rejected for the same reason as discussed in claim 3 above.
Claim 14 is rejected for the same reason as discussed in claim 4 above.
Regarding claim 20, Long discloses a surveillance system (Fig. 8; [0072]), comprising: a video camera (Fig. 1 – camera 102) comprising: an image sensor configured to capture image data for a video stream (Fig. 1; [0023] – image sensor 108 capturing image data for a video stream as further described at least at [0058]); and a microphone configured to capture audio data for the video stream (Fig. 1; [0024] – microphone 110 capturing audio data for a video stream as further described at least at [0058]); a processor ([0072]); a memory ([0073]); means for receiving the video stream from the video camera (Fig. 8; [0026] – a memory, via a processor and a bus, configured to receive the video stream from the camera for the processor to process the received data including video images and sounds, to remove personally identifiable data); means for automatically determining a human speaking condition from the video stream ([0063]; Fig. 8 – the processor with necessary software for recognizing voice in the audio, thus determining a human speaking condition from the video stream as there exist recognized voice in the video stream); means for selectively modifying, responsive to determining the human speaking condition, the audio data in the video stream during the human speaking condition ([0063]-[0064] – the processor with necessary software for redacting the recognized voice in the audio data in the video stream); and means for displaying the video stream using the modified audio data (Fig. 8; [0070] – an output 820, comprising a display and speaker, for displaying the video stream). 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5-7 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Long as applied to claims 1-4, 11-14, and 20 above, and further in view of Merkel (US 2021/0295851 A1 – hereinafter Merkel).
Regarding claim 5, see the teachings of Long as discussed in claim 4 above. However, Long does not disclose processing the audio data to mask the human voice component of the audio data further includes: separating the human voice component of the audio data into a plurality human voice components corresponding to individual speakers of a plurality of speakers; identifying at least one individual speaker of the plurality of speakers; and selectively masking, responsive to identifying the at least one individual speaker, a portion of the human voice component of the audio data corresponding to the at least one individual speaker.
Merkel discloses processing audio data to mask a human voice component of the audio data includes: separating the human voice component of the audio data into a plurality human voice components corresponding to individual speakers of a plurality of speakers (Fig. 4; [0043]); identifying at least one individual speaker of the plurality of speakers (Fig. 4; [0022]; [0043]); and selectively masking, responsive to identifying the at least one individual speaker, a portion of the human voice component of the audio data corresponding to the at least one individual speaker ([0022]; [0032]).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Merkel into the system taught by Long to allow for application of a policy permission that can flexibly redact only speeches of particular speakers among a group of speakers.
Regarding claim 6, see the teachings of Long as discussed in claim 1 above. However, Long does not disclose the controller is further configured to: detect an audio event from the audio data in the video stream; and process, using a voice recognition algorithm, the audio data in the video stream to identify a protected speaker; determining the human speaking condition from the video stream is responsive to detecting the audio event; and selectively modifying the audio data in the video stream during the human speaking condition is further responsive to identifying the protected speaker.
Merkel discloses a controller is controller is further configured to: detect an audio event from audio data in a stream ([0007]; [0029] – detecting speeches in audio data); and process, using a voice recognition algorithm, the audio data in the stream to identify a protected speaker ([0007]; [0030]-[0032] – processing the audio data using audio profiles to recognize a voice, to identify a speaker protected from policy permissions); determining the human speaking condition from the stream is responsive to detecting the audio event ([0007]; [0029]-[0032] – identifying a speaker speaking); and selectively modifying the audio data in the stream during the human speaking condition is further responsive to identifying the protected speaker ([0030]-[0032] – modifying the audio data by redacting the audio data of the protected speaker).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Merkel into the system taught by Long to process the audio data in the video stream and allow for application of a policy permission that can flexibly redact only speeches of particular speakers among a group of speakers.
Regarding claim 7, see the teachings of Long as discussed in claim 1 above. However, Long does not disclose the controller is further configured to process, using a voice recognition algorithm, the audio data in the video stream to identify a consent pattern; and selectively modifying the audio data in the video stream during the human speaking condition is further responsive to an absence of the consent pattern in the audio data.
Merkel discloses a controller is configured to process, using a voice recognition algorithm, the audio data in the video stream to identify a consent pattern ([0007]; [0030]-[0032] – processing the audio data using audio profiles to recognize a voice, to identify a speaker not protected from policy permissions); and selectively modifying the audio data in the video stream during the human speaking condition is further responsive to an absence of the consent pattern in the audio data ([0030]-[0032] – modifying the audio data by redacting the audio data of a protected speaker).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Merkel into the system taught by Long to process the audio data in the video stream and allow for application of a policy permission that can flexibly redact only speeches of particular speakers among a group of speakers.
Claims 15 and 16 are rejected for the same reasons as discussed in claim 6 above.
Claim 17 is rejected for the same reason as discussed in claim 7 above.
Claims 8-9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Long as applied to claims 1-4, 11-14, and 20 above, and further in view of Peleg et al. (US 2020/0143838 A1 – hereinafter Peleg).
Regarding claim 8, see the teachings of Long as discussed in claim 1 above. However, Long does not disclose the controller is further configured to detect a video event from the image data in the video stream; and determining the human speaking condition from the video stream is responsive to detecting the video event.
Peleg discloses a controller is configured to detect a video event from image data in a video stream ([0036];[0041] – detecting a presence of a person in a video); and determining the human speaking condition from the video stream is responsive to detecting the video event ([0007]-[0008]; [0052] – identifying voice of the person).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Peleg into the system taught by Long to redact the voice of a protected speaker who is recognized from the video images, thus providing more effective ways of redactions.
Regarding claim 9, see the teachings of Long and Peleg as discussed in claim 8 above, in which Peleg also discloses the controller is further configured to process, using a facial recognition algorithm, the image data in the video stream to identify a protected speaker ([0007]-[0008]; [0052]; [0073] – using face recognition to identify a person); and selectively modifying the audio data in the video stream during the human speaking condition is further responsive to identifying the protected speaker ([0007]-[0008]; [0052] – removing sounds of the person while keeping other soundtracks). The motivation for incorporating the teachings Peleg into the system of Long has been discussed in claim 8 above.
Claim 18 is rejected for the same reason as discussed in claims 8-9 above.
Claims 10, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Long as applied to claims 1-4, 11-14, and 20 above, and further in view of Jayapalan et al. (US 10789385 B1 – hereinafter Jayapalan).
Regarding claim 10, see the teachings of Long as discussed in claim 1 above. However, Long does not disclose a storage device configured to store: an unmodified video stream from the video camera; and the video stream with the modified audio data from the controller; an analytics engine configured to: process the unmodified video stream to determine the human speaking condition from the video stream; and notify the controller of the human speaking condition; and a user device configured to: display, using a graphical user interface and a speaker of the user device, the video stream with the modified audio data to a user; determine a security credential for the user; verify an audio access privilege corresponding to the security credential; and display, using the graphical user interface and the speaker of the user device, the unmodified video stream to the user.
Jayapalan discloses a storage device configured to store: an unmodified video stream from the video camera (column 6, lines 26-41; Fig. 1 – storing original media, including video stream from a video camera as further described at least at column 11, line 65); and the video stream with the modified audio data from a controller (Fig. 1 – storing redacted media 128); an analytics engine configured to: process the unmodified video stream to determine the human speaking condition from the video stream (column 8, lines 24-48 – processing the original media 114 to determine a human speaking condition during a conversion, as described at least at column 6, lines 28-32, that contains sensitive information and needs to be redacted); and notify the controller of the human speaking condition (column 9, line 54 – column 10, line 6; Fig. 2 – notifying a controller to tag the portion containing sensitive information); and a user device configured to: display, using a graphical user interface and a speaker of the user device (column 11, line 66 – column 12, line 2; column 14, lines 30-32 – via a speaker and a graphical UI of a user device), the video stream with the modified audio data to a user (Fig. 4 – step 412); determine a security credential for the user (Fig. 4 – step 408); verify an audio access privilege corresponding to the security credential (Fig. 4 – a ‘yes’ following step 408); and display, using the graphical user interface and the speaker of the user device, the unmodified video stream to the user (Fig. 4 – step 410).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Jayapalan into the system taught by Long to allow authorized users to fully access the original video stream, e.g. by law officials for legal purposes or by staff members who need full access to the original data to perform a business transaction, etc.
Regarding claim 15, see the teachings of Long as discussed in claim 11 above. However, Long does not disclose detecting an audio event from the audio data in the video stream, wherein determining the human speaking condition from the video stream is responsive to detecting the audio event.
Jayapalan discloses detecting an audio event from audio data in a video stream, wherein determining the human speaking condition from the video stream is responsive to detecting the audio event (column 8, lines 24-48 – processing the original media 114 to determine a human speaking condition during a conversion, as described at least at column 6, lines 28-32, that contains sensitive information and needs to be redacted).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Jayapalan into the system taught by Long to detect and redact only the part that contains sensitive information, thus keeping the conversation as comprehensible as possible.
Claim 19 is rejected for the same reason as discussed in claim 10 above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG Q DANG whose telephone number is (571)270-1116. The examiner can normally be reached IFT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Q Tran can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUNG Q DANG/Primary Examiner, Art Unit 2484