Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. PCT/CA2018/051369, filed on 10/29/2018.
Drawings
The drawing submitted on 04/27/2020 is considered by the examiner.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/14/2022 has been entered. 
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claims 13 and 19 interpretation under 112 six paragraph has been withdrawn due to amendment limiting the generic place holder to a processor which is a specific structure.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 3-6, 8-18, 20, and 38-40, are rejected under 35 U.S.C. 103 as being unpatentable over Shalon et al.(US 2011/0125063 A1) in view of Goldstein et al.(US 2008/0253583 A1).

Regarding Claim1, Shalon et al. teach: A method for training a classification module of nonverbal audio events, the method comprising ([0062] To monitor such activities, the system of the present invention includes a sensor unit mountable on or in a body region of the subject. The sensor is selected capable of sensing mechanical (e.g. jaw motion), thermal (e.g. body temperature), electrical (e.g. EKG or EMG) or acoustic activities. Acoustic activity is preferably, non-verbal (i.e. does not result from vocal chord vibrations) acoustic energy at a frequency of 0.001 Hz to 100 kHz, which is generated from mechanically-induced vibrations or motion. Further description of activities and suitable sensors for monitoring such activities is provided hereinbelow. [0063] In some embodiments, a wireless headset system of the present invention comprises several sub-systems: a sensor sensitive to jaw motion, unvoiced mouth sounds (teeth clicks, for example)… All or at least a portion of these subsystems can be integrated into a self-contained package that can be worn by or implanted in the body of the user. [0071] Acoustic energy generated by chewing, swallowing, biting, sipping, drinking, teeth grinding, teeth clicking, tongue clicking, tongue movement, jaw muscles or jaw bone movement, spitting, clearing of the throat, coughing, sneezing, snoring, breathing rate, breathing depth, nature of the breath, heartbeat, digestion, motility to or through the intestines, tooth brushing, smoking, screaming, user's voice or speech, other user generated sounds, and ambient noises in the user's immediate surroundings can be monitored through one or more sensors (e.g. microphones) positioned in or around the ear area, on the skull, neck, throat, chest, back or abdomen regions): occluding (space-filling element or earpiece 902) the ear canal ([0096] …a space-filling element in the ear canal,… The system can be designed with flexible members to universally fit into all ear canals, or a limited number of standard sizes to fit most ear canals, or lastly custom fit for each individual based on his or her ear canal geometry. [0380] The microphone and the support piece all fit within the confines of the outer rim of the concha cavity of the human subject [0389] The sensor and device are configured to be placed on or in the ear E, which includes the tragus 150, concha 140, antitragus 160, helix 180, and ear canal EC. [0390] In the embodiment shown in FIGS. 15-16, the earpiece 902 is adapted to be placed on the ear 10, and the sensor is adapted to be placed within the ear canal EC. [0392] Earpiece 902 may be made of molded plastic in a single size or one of a small number of predesigned sizes. Alternatively, earpiece 902 may be custom-made to the requirements of an individual wearer by providing a cavity of constant dimensions for the interior components and varying the size of the enclosure and/or of the sensor stalk and elastic wire. The sensor 904 can be coupled to the earpiece and disposed within and in contact with the ear canal EC. NOTE: Sensor and the earpiece is configured to place in the ear E. Sensor is not occluding but the earpiece device 902 fit within the cavity of the ear is occluding the ear canal and further couple with the non-occluding sensor 904 which is attached within the ear canal tissue in a non-occluded or non-obtrusive way (See Fig.15-16). As per the applicant argument in the remark, examiner treated fully occluding as per applicant specification [0038-0039] intra-aural hearing protection earpiece 12 such as an earplug or acoustic seal.);using a first microphone(in-ear or ear canal microphone), capturing an in-ear sound pressure present in the occluded ear canal ( [0071] Acoustic energy generated …can be monitored through one or more sensors (e.g. microphones) positioned in or around the ear area,… [0291] The source of the audio transmissions can be a single integrated microphone and sensing system monitoring one location on the user's body. Alternatively, the audio transmissions can be from several microphones or sensing systems in one or more locations of the user's body. [0380] A thin electrical wire transmitted the acoustic energy collected by the microphone … [0408] Chewing events or jaw motions can be sensed by the sensors described herein (e.g., sensors 904, 1800, 200, 301, 1101, and 311) via mechanical deformation of the ear canal that occurs during chewing or acoustic vibrations that occur during chewing. This can allow the system to capture only the sounds during chewing in order to determine the nature of the food being consumed. [0409] The deformations of the ear canal can be detected using sensors described above, including microphones, accelerometers, piezo or strain gage technology known in the art, changes in volume or pressure of a space-filling element in the ear canal, and/or optical sensors. [0410] Referring back to FIGS. 15-16, the processing unit 906 can be configured to process ingestion related activity from the sensor (e.g., deformations or vibrations in the ear canal sensed by the sensor) to determine the quantity of food and drink ingested, duration of eating, total caloric intake, etc. [0420] In-ear microphones are susceptible to non-speech sounds including jaw motion sounds, chewing sounds, and saliva sounds.); using a second microphone (open air microphone or outer ear canal microphone), capturing an outer-ear sound pressure present at the outer entry of the occluded ear canal as an outer-ear audio signal; denoising the captured in-ear audio signal using the captured outer-ear audio signal ([0408] With further combining jaw motion sensing and sound detection it is possible to add additional discrimination power between speech sounds from non-speech events and sounds generated by the wearer. This can allow the system to capture only the sounds during chewing in order to determine the nature of the food being consumed. In addition, by distinguishing speech sounds from non-speech sounds emitted by the user, the device can more effectively suppress background noise and wind noise that plagues open air microphones found on most headsets, ... [0413] The sensors 1800 and 200 described above can receive speech signals as vibrations and convert them into an electronic signal. Because the sensors are placed in the ear canal or concha, and are designed to be relatively insensitive to acoustical signals coupled through air, they can reject most ambient noise and wind noise. In addition, their placement inside the outer ear canal affords particularly good signal levels for speech and helps to reject wind noise.); associating at least one nonverbal audio event to the denoised in-ear audio signal; sampling the in-ear audio signal; extracting audio features of each sample of the in-ear audio signal ([0194] A preprocessing stage filters out noise, normalizes the energy level, and segments the sampled sound into analysis frames. Features are then extracted from the signal using spectral signature analysis to identify waveforms with eating microstructure events (signatures). The extracted components are then evaluated by a statistical classifier that combines the observed data (the features) with prior information about the patterns to segment the input data into specific event categories such as chews, sips, and speech. The extracted acoustic energy patterns are then mapped into food intake events. [0208] A preprocessing module detects the presence of eating activity and automatically conditions the signal using automatic gain control on the analog signal prior to being digitized. An automatic gain control system can be used to adjust the input gain guaranteeing a good use of the dynamic range while preventing clipping of the signal. The normalized signal is digitized using an analog-to-digital converter with a precision of 16 bits and a sampling rate of 8,000 Hz. [0209] A signal processing module extracts a sequence of salient features from the digitized signal. The digitized signal can be parameterized using short-term analysis frames. The frame rate is 1/100 ms and the analysis window is 500 ms. Methods for extracting periodic information may be based on the autocorrelation method or the Fourier transform. [0214] If necessary, the signal is normalized and ambient noise is subtracted from the signal. Spectral analysis or other data manipulation techniques are performed. Fast Fourier transform (FFT) can be used with appropriate frequency ranges to obtain the power in each frequency band. Such processing can be restricted to signal segments of interest while using `low resolution` spectral analysis to search for the beginning of an event and stand-by mode to conserve power between segments. [0215] The spectral analysis and raw signal are utilized to extract features and categorize them with a time stamp and a fitness score. Other parameters might accompany each feature based on its nature. [0216] Utilizing rules relating to the microstructure of eating, the software components can then determine if an event or events fit an acceptable pattern.); extracting (detect the eating patterns of the user over time) audio features of another in-ear audio signal of the same known nonverbal audio event([0106] The system can detect the eating patterns of the user over time, thereby building a database of ingestion behavior and "learning" and customizing the performance of the system to the user. [0107] The data generated by the system for each subject can be integrated and/or aggregated into a flat or relational database containing the behavior related activity signatures collected from a plurality of subjects over a time period.); comparing the said extracted audio features with the extracted audio features of the other in-ear audio signal of the same known nonverbal audio event; validating the extracted audio features based on the comparison of both sets of extracted audio features; associating the validated and extracted audio features to the at least one nonverbal audio event ([0107] The data generated by the system for each subject can be integrated and/or aggregated into a flat or relational database containing the behavior related activity signatures collected from a plurality of subjects over a time period. Such a database can contain proprietary data, publicly available data, anonymously collected data, and/or data collected with subject identification information. Data could be collected at a variety of levels, from raw recordings of sensor data, processed sensor data, activity-related signatures, or high-level behavioral data. Such a database would be useful for establishing norms, averages, trends, classifications, calibrations, historical behaviors, reference sets, training data, statistical tests, clinical trials, targeted marketing, third party interventions, and relative scores in a stand-alone manner as pure data or as an integral part of the system used by individual subjects.  By way of example, a database of the physical activity patterns or ingestion patterns of many subjects can be cross referenced to a database of their health, medical, exercise, drug use and/or weight records. By way of a second example, an aggregated database of the ingestion related motion or acoustic energy patterns detected for a plurality of users can be used to train the algorithms used to convert these patterns into classifications of ingestion behaviors. [0194] The extracted components are then evaluated by a statistical classifier that combines the observed data (the features) with prior information about the patterns to segment the input data into specific event categories such as chews, sips, and speech. The extracted acoustic energy patterns are then mapped into food intake events. [0206] The salient acoustic features are extracted using a statistical-based pattern recognition system to classify the sounds into specific events. The output of the recognizer can be a hypothesized event sequence that can be used to track the flow of ingested food. The accuracy of the hypothesized output can be validated using a database of sounds annotated by a panel of human expert listeners. [0210] To classify the input signal into an optimal sequence of eating events processing unit 14 can search a state graph that represents all admissible state sequences. The Hidden-Markov models (HMMs) can be used to represent the microstructure of eating. In this case, an eating even is represented as a bite followed by one or more chews and a swallow.  [0211] Features such as chew count, chew, duration, and chew energy are computed as part of "eating" events. The features are fed to a mapping algorithm that estimates the weight of the ingested food based on a calibration of the known food weight reference available database used to train system 10. Alternative methods of estimating the ingested food weight include HMM's, neural network, and regression trees. [0219] System 10 needs to detect, infer, or estimate the following based on historical data: swallow event, swallow volume, and energy density or what was swallowed. It is the cumulative product of the three that is of main interest to the user in real-time, at each feeding event, and as a daily summary.).
Shalon et al. however do not explicitly teaches that space-filling element or earpiece 902 is “fully occluding (plug or sealing) ear canal”.

Goldstein et al. teach: fully occluding (plug or sealing) ear canal to reduce noise([0008] At least one exemplary embodiment is directed to An Always-On Recording System (AORS) comprising: an acoustic monitoring assembly configured to monitor the acoustic field in a user’s immediate environment using an Ambient Sound Microphone (ASM) to monitor sound at an occluded ear canal; a signal processing circuit operatively connected to the assembly, where the signal processing circuit is configured to amplify an ASM signal from the ASM and to equalize for the frequency sensitivity of the ASM; an acoustic field monitoring assembly configured to monitor the acoustic field in the occluded ear canal, comprising an ear canal microphone (ECM) mounted in an earpiece that forms an acoustic seal of the occluded ear canal; a signal processing circuit configured amplify an ECM signal from the ECM and to equalize for the frequency sensitivity of the ECM; and a data storage device configured to act as a circular buffer for constantly storing at least one of ECM signal and ASM signal. [0026] The AORS records audio information can obtain information from at least one acoustic sound sensor(s) (microphones) mounted in the ear-sealing assembly. The ear-sealing assembly can provide a Noise Reduction Rating of 20-30 dB. In at least one exemplary embodiment of the present invention a number of microphones are located outside the occluded ear canal to monitor sound pressure levels near the entrance to the ear canal (this is the Ambient Sound Microphone--ASM) and within the occluded ear canal to facilitate monitoring of sound within the ear canal (this is the Ear Canal Microphone--ECM). [0049] An example of an embodiment of an electroacoustic assembly that the Always On (Headwear) Recording System (AORS) may function with is given in FIG. 1a and FIG. 1b. This shows the earphone body 10, which houses the electro acoustic transducers 34, 28, 32 and electronic units 20, 22, 24, 26. The earpiece 8 forms a seal in the ear canal 1 of a user, with the outside end 12 substantially flush with the entrance to the ear canal (i.e. the ear meatus)--i.e. the hearing protection device shown in this embodiment is a "completely in the ear" type, which provides passive sound attenuation of ambient sound transmitted to the ear-drum of the user in the order of 20-25 dB over the frequency range of human hearing (50-20 kHz). [0059] At least one exemplary embodiment is directed to a self-contained Always-On Recording System (AORS) which operates like aviation "flight recorders" by storing a recent history of electronic sound signals presented to a User with an earphone device whilst simultaneously recording sound in the User's local ambient sound field using an Ambient sound Microphone at the entrance to the Users fully or partially occluded ear canal, and simultaneously recording sound in the Users occluded or partially occluded ear canal using an Ear Canal Microphone.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention for Shalon et al. to include Goldstein et al. teaching of an acoustic monitoring assembly configured to monitor the acoustic field in a user’s immediate environment using an Ambient Sound Microphone (ASM) to monitor sound at an occluded ear canal in order to record sound in the Users fully occluded ear canal using an Ear Canal Microphone and also to provide passive sound attenuation of ambient sound transmitted to the ear-drum of the user.

Regarding Claim 3, Shalon et al. teach:  See rejection of claim 1, and [0062] The sensor is selected capable of sensing mechanical (e.g. jaw motion), thermal (e.g. body temperature), electrical (e.g. EKG or EMG) or acoustic activities. Acoustic activity is preferably, non-verbal (i.e. does not result from vocal chord vibrations) acoustic energy at a frequency of 0.001 Hz to 100 kHz, which is generated from mechanically-induced vibrations or motion.  [0209] A signal processing module extracts a sequence of salient features from the digitized signal. The digitized signal can be parameterized using short-term analysis frames. The frame rate is 1/100 ms and the analysis window is 500 ms. [0210] Another alternative is to detect chew events using a sliding window. [0211] Features such as chew count, chew, duration, and chew energy are computed as part of "eating" events. [0337] Initial test samples were recorded and the input gain of the preamplifier adjusted in order to cover the maximum dynamic range of the audio channel without clipping. Once the session started, all of a subject's data was put in a single audio file at a sampling rate of 16 KHz with 16 bits of precision. A sampling rate of 8 KHz. may be used with no significant loss in performance.
Shalon et al. in view of Goldstein et al. do not explicitly teach: wherein the sampling further comprises sampling a frame having a duration ranging between 200 milliseconds and 1200 milliseconds.
However, Shalon et al. teaching sampling frame duration overlap or lie inside ranges as claimed.
In the case where the claimed ranges “overlap or lie inside ranges disclosed by the prior art” a prima facie case of obviousness exists. In re Wertheim, 541 F.2d 257, 191 USPQ 90 (CCPA 1976); In re Woodruff, 919 F.2d 1575, 16 USPQ2d 1934 (Fed. Cir. 1990).

Regarding Claim 4, Shalon et al. teach: The method of claim 3 wherein the sampling further comprises sampling a 400 milliseconds frame of the in-ear audio signal (See rejection of claim 3, In the case where the claimed ranges “overlap or lie inside ranges disclosed by the prior art” a prima facie case of obviousness exists. In re Wertheim, 541 F.2d 257, 191 USPQ 90 (CCPA 1976); In re Woodruff, 919 F.2d 1575, 16 USPQ2d 1934 (Fed. Cir. 1990).).

Regarding Claim 5, Shalon et al. teach: The method of claim 1, wherein validating the extracted audio features further comprises comparing the extracted audio features with a plurality of samples of the in-ear audio signal (See rejection of claim 1 ).

Regarding Claim 6, Shalon et al. teach:  The method of claim 1, wherein validating the extracted audio features further comprises testing (fed to a mapping algorithm) the classification module with the extracted audio features (See rejection of claim 5, specifically [0210] To classify the input signal into an optimal sequence of eating events processing unit 14 can search a state graph that represents all admissible state sequences. The Hidden-Markov models (HMMs) can be used to represent the microstructure of eating. A sample model is shown in FIG. 3. In this case, an eating even is represented as a bite followed by one or more chews and a swallow. The advantage of using HMMs is that the model parameters can be estimated without the need of having detailed time alignments for the underlying structure. This approach makes it possible to estimate model parameters for bites, chews, and swallows without detailed time alignment information which would be very tedious and difficult to obtain for large amounts of data. [0211] Features such as chew count, chew, duration, and chew energy are computed as part of "eating" events. The features are fed to a mapping algorithm that estimates the weight of the ingested food based on a calibration of the known food weight reference available database used to train system 10.).

Regarding Claim 8, Shalon et al. teach:  The method of claim 1, wherein the at least one nonverbal audio event is a user induced nonverbal audio event (See rejection of claim 5, specifically [0219] System 10 needs to detect, infer, or estimate the following based on historical data: swallow event, swallow volume, and energy density or what was swallowed. It is the cumulative product of the three that is of main interest to the user in real-time, at each feeding event, and as a daily summary.).

Regarding Claim 9, Shalon et al. teach:   The method of claim 8, wherein the user induced nonverbal audio event is selected from the group consisting of teeth clicking, tongue clicking, blinking, eye closing, teeth grinding, throat clearing, saliva noise, swallowing, coughing, talking, yawning with inspiration, yawning with expiration, respiration, heartbeat and head or body movement, earpiece manipulation, and any combination thereof (See rejection of claim 5 and [0071] Acoustic energy generated by chewing, swallowing, biting, sipping, drinking, teeth grinding, teeth clicking, tongue clicking, tongue movement, jaw muscles or jaw bone movement, spitting, clearing of the throat, coughing, sneezing, snoring, breathing rate, breathing depth, nature of the breath, heartbeat, digestion, motility to or through the intestines, tooth brushing, smoking, screaming, user's voice or speech, other user generated sounds, and ambient noises in the user's immediate surroundings can be monitored through one or more sensors (e.g. microphones) positioned in or around the ear area, on the skull, neck, throat, chest, back or abdomen regions. The preferred area allows for such sounds to be analyzed in order to determine the nature of the bolus swallowed (type of food, number of chews, hard versus soft chews, crunchy versus soft food, ingestion of liquid etc.). Microphones in different positions or orientations can be tuned to detect sounds originating within the user's body as opposed to ambient sounds surrounding the user. Software can be used to select which microphone is given priority for data collection and analysis based on the situation. Each microphone can be optimized to receive a specific range of sound frequencies corresponding to the signal to be measured. The sensing element can be designed to be sensitive to a wide range of frequencies of the acoustic energy generated in the head region, ranging from approximately 0.001 hertz up to approximately 100 kilohertz. The sensing element can be sensitive to just a narrow range of frequencies and a multiplicity of sensing elements used to cover a broader range of frequencies. The sensing element can receive the acoustic energy via air transmission, tissue or bone conduction.).

Regarding Claim 10, Shalon et al. teach:   The method of claim 1, wherein the at least one nonverbal audio event is a mechanically-induced event (ambient noise) that is external to the user (See rejection of claim 9 and [0071] … ambient noises in the user's immediate surroundings can be monitored through one or more sensors (e.g. microphones) positioned in or around the ear area, on the skull, neck, throat, chest, back or abdomen regions. [0075] The ambient noise or the noise of the speaker can be cancelled out from the microphone input using passive and active means such as a focusing diaphragm, …).

Regarding Claim 11, Shalon et al. teach: The method of claim 1, further comprising generating executable instructions for identifying  a non-verbal audio of a captured audio signal based on the validated and extracted audio features associated to the at least one nonverbal event, the classification module being configured to execute the generated instructions(See rejection of claim 1 specifically, [0106] The system can detect the eating patterns of the user over time, thereby building a database of ingestion behavior and "learning" and customizing the performance of the system to the user. [107] By way of a second example, an aggregated database of the ingestion related motion or acoustic energy patterns detected for a plurality of users can be used to train the algorithms used to convert these patterns into classifications of ingestion behaviors. [0211] Features such as chew count, chew, duration, and chew energy are computed as part of "eating" events. The features are fed to a mapping algorithm that estimates the weight of the ingested food based on a calibration of the known food weight reference available database used to train system 10. Alternative methods of estimating the ingested food weight include HMM's, neural network, and regression trees. [0256] System 10 can be utilized in conjunction with various diet plans. System 10 may be programmed to measure and analyze eating habits according to specific plans through varying levels of customization of the hardware, software or user interface.).

Regarding Claim 12, Shalon et al. teach: The method of claim 1, further comprising adding to the classification module the validated and extracted features associated to the at least one nonverbal event (See rejection of claim 1 specifically, ([0106] The system can detect the eating patterns of the user over time, thereby building a database of ingestion behavior and "learning" and customizing the performance of the system to the user. [0107] The data generated by the system for each subject can be integrated and/or aggregated into a flat or relational database containing the behavior related activity signatures collected from a plurality of subjects over a time period. By way of example, a database of the physical activity patterns or ingestion patterns of many subjects can be cross referenced to a database of their health, medical, exercise, drug use and/or weight records. By way of a second example, an aggregated database of the ingestion related motion or acoustic energy patterns detected for a plurality of users can be used to train the algorithms used to convert these patterns into classifications of ingestion behaviors. [0206] The salient acoustic features are extracted using a statistical-based pattern recognition system to classify the sounds into specific events. The output of the recognizer can be a hypothesized event sequence that can be used to track the flow of ingested food. The accuracy of the hypothesized output can be validated using a database of sounds annotated by a panel of human expert listeners. [0209] A signal processing module extracts a sequence of salient features from the digitized signal. The digitized signal can be parameterized using short-term analysis frames. The frame rate is 1/100 ms and the analysis window is 500 ms. [0210] To classify the input signal into an optimal sequence of eating events processing unit 14 can search a state graph that represents all admissible state sequences. Once the HMM parameters are estimated it is possible to search for the most likely state sequence given the observed data. A panel of human expert listeners can annotate training data efficiently using a multimedia recording of the subjects that may include audio, video and position of the mouth or jaw. Once the HMM parameters are estimated it is possible to search for the most likely state sequence given the observed data. The sequence of states is used to hypothesize the sequence of bites, chews, and swallows. Alternative methods of classifying the data include neural networks classifier. Another alternative is to detect chew events using a sliding window. For each window offset a score is computed using a Gaussian mixture model. Alternatively, wavelet methods can be used to classify the relevant chew features.).

Regarding Claim 13, Shalon et al. teach: A system for training a classification module of nonverbal audio events, the system comprising: an electronic earpiece comprising: an acoustic seal for fully occluding the ear canal; an in-ear microphone for capturing sound pressure present within the fully occluded ear canal as an in-ear audio signal; and an outer-ear microphone for capturing sound pressure present at the outer entry of the occluded ear canal as an outer-ear audio signal; a memory for storing the in-ear audio signal captured by the in-ear microphone; a processing unit configured for: sampling the stored in-ear audio signal present in the memory; extracting a plurality of audio features from the sampled audio signal and extracting audio features of another in-ear audio signal of the same known nonverbal audio event; denoising the captured in-ear audio signal using the captured outer-ear audio signal; comparing the extracted audio features of the in-ear audio signal captured by the in-ear microphone with the extracted audio features of the other in-ear audio signal; validating the extracted audio features of the in-ear audio signal captured by the in- ear microphone based on the comparison of both sets of extracted audio features; receiving a nonverbal audio event definition corresponding to the captured audio signal; associating the validated plurality of audio features to the received nonverbal audio event definition (See the rejection of claim 1).

Regarding Claim 14, Shalon et al. teach:  The system of claim 13, wherein the processing device is further configured to train the classification module to detect at least one of a health indicator, mood indicator, biosignal indicator, artefact indicator, command indicator, non-user induced event indicator, user induced event indicator (See rejection of claim 1, specifically [0066] The system can also be used to monitor and modify behaviors associated with eating disorders such as bulimia and anorexia, as well as to other behaviors including snoring, sleep apnea, bruxism, smoking, alcohol consumption, drug addiction, exercise and physical training, stuttering, panic disorders, attention deficit, hyperactivity disorders, or other disorders that have unique physiological, sound or motion characteristics (i.e. activities) that can be identified and monitored. [0087] The system can also measure the user's heart rate, heart rate coherence, breathing rate or breathing depth patterns or galvanic skin response to assess their stress, fear or anger level and then provide feedback to reduce the stress by, for example, talking the user through breathing exercises. This could be useful in proactively reducing violent activity and impulsive behavior. The galvanic skin response can also be correlated to the general mood of the user and the system can provide encouraging or funny verbal feedback to improve the user's mood. [0107] The data generated by the system for each subject can be integrated and/or aggregated into a flat or relational database containing the behavior related activity signatures collected from a plurality of subjects over a time period. By way of example, a database of the physical activity patterns or ingestion patterns of many subjects can be cross referenced to a database of their health, medical, exercise, drug use and/or weight records. By way of a second example, an aggregated database of the ingestion related motion or acoustic energy patterns detected for a plurality of users can be used to train the algorithms used to convert these patterns into classifications of ingestion behaviors. [0255] System 10 can utilize any physical activity indicator alone or a combination of base metabolic rate, physical activity, gender, age, height, weight, medical records, medical background, and health status of the individual to calculate the caloric expenditure and full energy balance of the user.).

Regarding Claim 15, Shalon et al. teach: The system of claim 13, wherein the processing device is further configured to train the classification module to determine a nonverbal audio event according to an in-ear audio signal captured by a health monitoring system (See rejection of claim 14).

Regarding Claim 16, Shalon et al. teach: The system of claim 13, wherein the processing device is further configured to train the classification module to determine a nonverbal audio event according to an in-ear audio signal captured by an artefact removal system (See rejection of claim 14).

Regarding Claim 17, Shalon et al. teach: The system of claim 13, wherein the processing device is further configured to train the classification module to determine a nonverbal audio event according to an in-ear audio signal captured by a biosignal monitoring system (See rejection of claim 14).

Regarding Claim 18, Shalon et al. teach: The system of claim 13, wherein the processing device is further configured to train the classification module to determine a nonverbal audio event according to an in-ear audio signal captured by a silent interface (See rejection of claim 14 and also see [0071] … ambient noises in the user's immediate surroundings can be monitored through one or more sensors (e.g. microphones) positioned in or around the ear area, on the skull, neck, throat, chest, back or abdomen regions. [0075] The ambient noise or the noise of the speaker can be cancelled out from the microphone input using passive and active means such as a focusing diaphragm, …).

Regarding Claim 20, Shalon et al. teach: The system of claim 13, the processing device being configured to generate executable instructions identifying a non-verbal audio even of a captured audio signal based on the validated plurality of audio features associated to the received nonverbal audio event definition, the classification module being configured to execute the generated instructions (See rejection of claim 11.).

Regarding Claim 38, Shalon et al. teach: The method of claim 1 wherein the validation of the extracted audio features is performed by executing a machine learning algorithm (See rejection of claim 1 and specifically paragraph [0106] The system can detect the eating patterns of the user over time, thereby building a database of ingestion behavior and "learning" and customizing the performance of the system to the user.  [107] The data generated by the system for each subject can be integrated and/or aggregated into a flat or relational database containing the behavior related activity signatures collected from a plurality of subjects over a time period. Data could be collected at a variety of levels, from raw recordings of sensor data, processed sensor data, activity-related signatures, or high-level behavioral data. By way of a second example, an aggregated database of the ingestion related motion or acoustic energy patterns detected for a plurality of users can be used to train the algorithms used to convert these patterns into classifications of ingestion behaviors.  [0211] Features such as chew count, chew, duration, and chew energy are computed as part of "eating" events. The features are fed to a mapping algorithm that estimates the weight of the ingested food based on a calibration of the known food weight reference available database used to train system 10.).

Regarding Claim 39, Shalon et al. teach: The system of claim 13, the processing device being further configured to execute a machine learning algorithm to perform validation of the extracted audio features (See rejection of claim 38).

Regarding Claim 40, Shalon et al. teach: The system of claim 13 further comprising an adaptive filter configured to perform the denoising of the captured in-ear audio signal using an estimation of the transfer function between the second and first microphones (See rejection of claim 1 and [0417] A second approach uses an adaptive filter using the microphone as a reference for speech-plus-noise. The bone conduction sensor provides a measure of the desired speech signal, and the adaptive filter removes the ambient noise from the speech-plus-noise signal using the conducted speech as a template. ).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Abeyratne et al. (Us 10098569 B2) teach: (Abstract)A method of operating a computational device to process patient sounds, the method comprises the steps of: extracting features from segments of said patient sounds; and classifying the segments as cough or non-cough sounds based upon the extracted features and predetermined criteria; and presenting a diagnosis of a disease related state on a display under control of the computational device based on segments of the patient sounds classified as cough sounds.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656