DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 4/1/2021.   Claims 1-20 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings filed on 4/1/2021 have been accepted and considered by the Examiner.

Information Disclosure Statement
4.	The information disclosure statements (IDS) submitted on November 5, 2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
5.	Claim 2 is objected to because of the following informalities:  Claim 2 recites “the short frame.”  There is no antecedent basis for this term.  Appropriate correction is required.


Claim Rejections - 35 USC § 102
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


7.	Claims 1, 2, 4-8 and 11 are rejected under 35 U.S.C. 102 as anticipated by “Speaker Recognition with Cough, Laugh and “Wei”” (Zhang et al., hereinafter “Zhang”) (cited in IDS dated 11/5/2021).
With regard to Claim 1, Zhang describes:
“A method comprising:
recording an audio sample representative of a cough episode; (Introduction, page 497 describes that audio data of a cough is recorded.)
converting the audio sample to a spectrogram; (The first step in Figure 2 shows that a spectrogram is generated to be input into a neural network)
segmenting the spectrogram into frames; (Section II describes that a window size for input into the NN is 20 frames.  Figure 2 shows the window over the spectrogram.)
processing the frames using a neural network to create a plurality of embeddings, wherein each of the plurality of embeddings corresponds with a respective one of the frames;  (Section II describes that speaker features are determined from the frames.  Speaker features are cited as “embeddings.”  Section II further describes that the features are averaged over all of the frames to create a “d-vector”, and thus there must be a feature in each frame.)
combining the plurality of embeddings to obtain a global embedding; (Section II describes a “d-vector” created by averaging the features over all the frames, which is cited as a “global embedding.”) and
predicting whether the audio sample is from an enrolled user based, at least in part, on the global embedding.  (Section IV(C) describes the results for identifying a user based on their cough.)
With regard to Claim 2, Zhang describes:
“applying channel-wise average pooling to the plurality of embeddings to create an intermediate embedding for each of the short frames; (The intermediate embeddings are formed by the Max Pool layers shown in Figure 2.) and
combining the intermediate embeddings to obtain the global embedding.  (Figure 2 shows the output of the Max Pool layers is used to create the “d-vector”)


With regard to Claim 4, Zhang describes “the neural network is trained using a multi-task learning technique.”  (Section I describes that the NN is trained with laugh, cough, and voice data, and is thus a multi-task learning technique.)
With regard to Claim 5, Zhang describes “the multi-task learning technique comprises training on cough episodes and speech segments.”  (Section I describes that the NN is trained with laugh, cough, and voice data, and is thus a multi-task learning technique.)
With regard to Claim 6, Zhang describes “the combining comprises averaging the plurality of embeddings.” (Section II describes a “d-vector” created by averaging the features over all the frames, which is cited as a “global embedding.”)
With regard to Claim 7, Zhang describes “the predicting uses a cosine similarity metric.”  (The end of Section IV(B) describes the use of cosine distance as an evaluation method, which is cited as “a cosine similarity metric.)
With regard to Claim 8, Zhang describes “the neural network comprises a plurality of network nodes arranged in convolutional layers.” (Figure 2 shows that the NN includes convolutional layers.)
With regard to Claim 11, Zhang describes:
“enrolling a plurality of utterances from the enrolled user by aggregating known utterances from the enrolled user; (Section IV(A) describes that 120 seconds of aggregated utterances is used from 2,500 male and 2,500 female speakers) and
comparing the global embedding to the known utterances, wherein the predicting whether the audio sample is from the enrolled user is based, at least in part, on the comparison.”  (Section IV(C) shows that determining the identity of the subject is based on the d-vector, cited as the global embedding.)

Claim Rejections - 35 USC § 103
8.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


9.	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of US Pat. App. Pub. No. 20190108444 (Song et al., hereinafter “Song”).
With regard to Claim 3, Zhang does not explicitly describe “creating an intermediate embedding for each of the short frames utilizes a fully-connected layer of the neural network.”
However, paragraph 63 of Song describes creating embeddings using a fully-connected layer.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the fully-connected layers as described by Song into the system of Zhang to create multiple dense embeddings, as described at paragraph 63 of Song.

10.	Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of US Pat. App. Pub. No. 20200380365 (Karino).
With regard to Claim 9, Zhang does not explicitly describe “each convolutional layer of the convolutional layers is followed by batch normalization (batch-norm) and a rectified linear unit (ReLu).”
However, Figure 3 of Karino shows that the convolutional layers are followed by batch normalization and a ReLu, (black circles, described in paragraph 51).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the batch norm and ReLu as described by Karino into the system of Zhang to prevent the gradient disappearance, as described at paragraph 52 of Karino.
With regard to Claim 10, Zhang does not explicitly describe “a skip connection between an output of a first convolutional layer's batch-norm and an output of a final convolutional layer's batch-norm.”
However, Figure 3 of Karino shows a skip connection between the first and last batch norms (arrows, as described in paragraph 54).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the skip connections as described by Karino into the system of Zhang to further reduce gradient disappearance, as described at paragraph 8 of Karino.

11.	Claims 12-19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of US Pat. App. Pub. No. 20150073306 (Abeyratne et al., hereinafter “Abe”).
With regard to Claim 12, Zhang describes:
“receive coughs and test utterances from a plurality of individuals; (Section IV(A) Test data includes coughs and utterances from 104 people.) and
train a neural network to generate a global embedding based on respective ones of the coughs and test utterances, wherein the global embedding is indicative of the individual corresponding to the respective coughs and test utterances, (Figure 2 shows a neural network that generates global embedding “d-vector”) including:
perform speaker verification based on the test utterances; (Section IV(C), Table II shows the speaker verification based on test utterances) and
perform cougher verification based on the coughs. (Section IV(C), Table II shows the cougher verification based on coughs)
Zhang does not explicitly describe “A non-transitory computer readable medium comprising instructions executable to cause a processor to:”
However, paragraph 91 of Abe describes a computer readable medium including processor instructions. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the medium as described by Abe into the system of Zhang to store instructions for a processor, as described at paragraph 91 of Abe.
With regard to Claim 13, Zhang describes:
convert the cough into a spectrogram; (Figure 2 shows a spectrogram of the audio data as a first step.) and
segment the spectrogram into a plurality of frames; (Section II describes that a window size for input into the NN is 20 frames.  Figure 2 shows the window over the spectrogram.) and
process the plurality of frames to create an embedding; (Section II describes that speaker features are determined from the frames.  Speaker features are cited as “embeddings.”) and
predict whether the individual is enrolled. (Section IV(C), Table II shows the speaker verification results based on test utterances)
With regard to Claim 14, Zhang does not explicitly describe “the speaker verification is tested on a natural cough dataset comprising cough embeddings for a plurality of users.”
However, paragraph 120 of Abe describes a cough dataset with 665 cough sounds. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the cough dataset as described by Abe into the system of Zhang to allow for recognition of different types of cough sounds, as described at paragraph 117 of Abe.
With regard to Claim 15, Zhang does not explicitly describe “the speaker verification is prioritized before the cougher verification.”  However, there would be two possibilities in this regard: (1) speaker first, or (2) cougher first.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to recognize the speaker first, as one or the other would have to go first, and there are a finite number of solutions.  KSR International Co. v. Teleflex Inc., 550 U.S. 398 (2007) See MPEP 2144.05(II)(B)
With regard to Claim 16, Zhang describes:
“at least one [[processor]] configured to create a global embedding based, at least in part, on the audio sample and predict a comparison result indicative of whether audio sample is from an enrolled user;” (Figure 2 shows a neural network that generates global embedding “d-vector.” Section IV(C), Table II shows the user verification results based on test sounds.)
Zhang does not explicitly describe a processor or:
“a microphone configured to receive an audio sample representative of a health sound audio; 
a display coupled to the processor and configured to display the comparison result.”
However, Abe describes a processor 70 at paragraph 91, and:
“a microphone configured to receive an audio sample representative of a health sound audio; (microphones 53, 55 describes at paragraph 88)
a display coupled to the processor and configured to display the comparison result.” (display 56 described at paragraph 89)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the hardware as described by Abe into the system of Zhang to allow for input and output of data, as described at paragraphs 88 and 89 of Abe.
With regard to Claim 17, Zhang describes:
encode the health sound audio to obtain an alternate representation; (Figure 2 shows a neural network that generates global embedding “d-vector.”)
compare the alternate representation to enrolled health sound audio samples from known users; (Section IV(B) describes comparing the d-vectors to the known test data) and
associate a selected one of the known users to the health sound audio based on the comparison. (Section IV(C), Table II shows the user verification results based on test sounds.)
With regard to Claim 18, Zhang describes:
the processor is further configured to compare the global embedding to the enrolled health sound audio samples from the known users and predict whether the audio sample is from the enrolled user based, at least in part, on the comparison between the global embedding and the enrolled health sound audio samples. (Section IV(C), Table II shows the user verification results based on test sounds.)
With regard to Claim 19, Zhang does not explicitly describe “a network communication interface configured to transmit the comparison result to a party.”
However, paragraph 93 of Abe describes transmitting data to a party over a network. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the network as described by Abe into the system of Zhang to allow for an application on a phone connected to a network to be used for sound recognition, as described at paragraph 93 of Abe.

12.	Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Abe and Karino.
With regard to Claim 20, Zhang describes “a neural network configured to utilize a network architecture to process the audio sample representative of a cough episode to create the global embedding, (Figure 2 shows a neural network that generates global embedding “d-vector.” ) wherein the network architecture comprises a plurality of layers including a convolutional layer and a final layer, (Figure 2 shows a convolutional layer and a final layer.)
Zhang does not explicitly describe “a skip connection links an output of the first convolution layer to an output of the final layer.”
However, Figure 3 of Karino shows a skip connection between the first and last batch norms (arrows, as described in paragraph 54).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the skip connections as described by Karino into the system of Zhang to further reduce gradient disappearance, as described at paragraph 8 of Karino.

Conclusion
13.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pat. App. Pub. No. 20190192047 (Stamatopoulos et al.) also performs spectrogram analysis on input cough data.
14.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656