Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
In line 1 of claim 16, “The in-ear device of claim 14” should read “The in-ear device of claim 15”. Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 6-7, 10-12, 15-18, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Grizzel (US10692489 B1).
With respect to claim 1 Grizzel teaches A method comprising: receiving an audio signal from one or more microphones (C5 para 2: “The audio may be detected by a microphone and the movement input may be detected by a motion sensor.”) of an in-ear device (Col 25 para 1: "As shown in FIG. 7A, an example wearable device 110 may be an earbud wearable device 110a with two sides where each side includes an inner-lobe insert 750 that includes a speaker 101." ); 
extracting one or more features of the audio signal (C8 2nd to last para: "The AFE 256 may...divide the digitized audio data 111 into frames representing time intervals for which the AFE 256 determines a number of values (i.e., features) representing qualities of the audio data 111, along with a set of those values…representing features/qualities of the audio data 111 within each frame.”);
generating an in-ear data object based on the one or more features (C1:  “FIG. 6 illustrates...audio data 111, sensor data 302, gesture data 304 and/or time data 306 to the server(s) 120 via a network(s) 199...”)
and publishing the in-ear data object (C1: “FIG. 3 is a conceptual diagram of a system architecture for parsing incoming utterances using multiple domains according to embodiments of the present disclosure.” The inputs to the NLU are Text data (300), Sensor data (302), Gesture data (304) and Time Data (306)).  
to an external system via a network (C5: 3rd para: “A single server 120 may perform all speech/movement processing or multiple servers 120 may combine to perform all speech/movement processing. Further, the server(s) 120 may execute certain commands, such as answering or responding to spoken utterances of a user of the wearable device 110 and/or answering or responding to certain movements of the user. In addition, certain speech/motion detection or command execution functions may be performed by the voice input device 110.”)
With respect to claim 2 Grizzel teaches wherein extracting the one or more features of the audio signal includes identifying a plurality of tokens based on a speech portion of the audio signal, wherein each token of the one or more tokens represents an utterance identified in the speech portion. (C8 para 2: “The ASR component 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111.” here each word of the text is considered a token.)
“As illustrated in FIG. 3…. When input text data 300 (e.g., ASR output text data) is received, the same text data that is input into the NLU pipeline for domain A 263-A may also be input into the NLU pipeline for domain B 263-B, where the components for domain B 263-B will operate on the text data as if the text data related to domain B, and so on for the different NLU pipelines for the different domains”.)
With respect to claim 4 & 18 Grizzel teaches wherein the in-ear data object consists of the plurality of tokens separate from the audio signal( C1:  “FIG. 6 illustrates...audio data 111, sensor data 302, gesture data 304 and/or time data 306 to the server(s) 120 via a network(s) 199...” and the server generates additional element of Text Data, C8 para 2: “The ASR component 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111.”)
With respect to claim 6 & 20 Grizzel teaches generating the in-ear data object includes generating one or more feature vectors based on the audio signal (C8 2nd to last para: "The AFE 256 may...divide the digitized audio data 111 into frames representing time intervals for which the AFE 256 determines a number of values (i.e., features) representing qualities of the audio data 111, along with a set of those values…representing features/qualities of the audio data 111 within each frame.”);.  
With respect to claim 7 Grizzel teaches wherein the one or more feature vectors are used to generate the one or more tokens (C8 para 2: “The ASR component 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111.” here each word of the text is considered a token.)
With respect to claim 10 Grizzel teaches wherein the in-ear data object further includes motion data that is indicative of a motion of a head of the user, where the motion data is collected from one or more motion sensors of the in-ear device.  (C23 last para: “The wearable device 110 may include at least one motion sensor 630 for detecting movement of the wearable device 110”, and “The motion sensor(s) 630 may convert the detected movement to sensor data 302.”  )
With respect to claim 11 Grizzel teaches wherein generating the in-ear data object further includes: labeling at least a subset of the tokens with respective labels (C11 para 5: “For example, the NER component 262 may parse the spoken utterance to identify words as subject, object, verb, preposition, etc.,...For instance, an utterance of “play mother's little helper by the rolling stones” might be parsed and tagged as {Verb}: “Play,” ); identifying one or more relationships between two or more tokens of the plurality of tokens (C11, last para: “For instance, an utterance of “play mother's little helper by the rolling stones” might be parsed and tagged as {Verb}: “Play,” {Object}: “mother's little helper,” {Object Preposition}: “by,” and {Object Modifier}: “the rolling stones.” ... it is determined that these phrases relate to the grammatical object of the spoken utterance....The frameworks linked to the intent are then used to determine what database fields should be searched to determine the meaning of these phrases...”); generating an annotation object based on the labels and the one or more relationships (C11 2nd to last para: This process includes semantic tagging, which is the labeling of a word or combination of words according to their type/semantic meaning. Parsing may be performed using heuristic grammar rules, or the NER component 262 may be constructed using techniques such as HMMs, maximum entropy models, log linear models, conditional random fields (CRF), and the like., and The output from the NLU processing, which may include tagged text data, commands, etc., may then be sent to a command processor 290); and including the annotation object in the in-ear data object (C12 last para: “The output from the NLU processing, which may include tagged text data, commands, etc., may then be sent to a command processor 290”). 
With respect to claim 12 Grizzel teaches wherein the external system (C34 2nd para “As shown in FIG. 18… The device may then send sensor data 302 corresponding to the motion to the server(s) 120.”) is a data analytics system that utilizes the output object to train a machine-learned model (C34 : “Various machine learning techniques may be used to perform the training... the ASR component 250/NLU component 260 in using motion data to perform ASR/NLU respectively, or other components.)
With respect to claim 15 Grizzel teaches An in-ear device comprising: a housing configured and dimensioned to fit in an ear canal of a user (Fig 7A, and C29 para 3: “A device 110 and/or server 120 may store data corresponding to a plurality or movement signatures/gestures. ... in the case of an ear bud wearable device that is inserted into the ear canal of a user”);  one or more sensors (“FIG. 8 illustrates example operation of a wearable device with motion sensor...”); one or more microphones (Col 3 1st para: "The device that captures spoken audio from a user may take one or more forms... other type of wearable microphone-enabled device."); a communication unit configured to communicate via a network (Col 5 3rd para: "As shown in FIG. 1, the system 100 may include one or more wearable devices 110 configured to process voice and motion inputs that are local to the user 5, as well as one or more network(s) 199. The server(s) 120 (which may be one or more different physical devices) may be capable of performing traditional speech processing...”); a processing device that executes computer-readable instructions (Col 35 last para to Col 36 1st para: "Each of these devices... may include one or more controllers/processors... and a memory... for storing data and instructions of the respective device...Each device may also include a data storage...for storing data and controller/processor-executable instructions....Computer instructions for operating each device (110/120) and its various components may be executed by the respective device's controller(s)/processor(s)...using the memory (1906/2006) as temporary “working” storage at runtime. A device's computer instructions may be stored in a non-transitory manner in non-volatile memory (1906/2006), storage (1908/2008), or an external device(s). Alternatively, some or all of the executable instructions may be embedded in hardware or firmware on the respective device in addition to or instead of software.”) that cause the processing device to: receive an audio signal from he AFE 256 may...divide the digitized audio data 111 into frames representing time intervals for which the AFE 256 determines a number of values (i.e., features) representing qualities of the audio data 111, along with a set of those values (i.e., a feature vector or audio feature vector) representing features/qualities of the audio data 111 within each frame.”);  generate an in-ear data object based on the one or more features ( “FIG. 6 illustrates...audio data 111, sensor data 302, gesture data 304 and/or time data 306 to the server(s) 120 via a network(s) 199...” ); and publish the in-ear data object (C1: “FIG. 3 is a conceptual diagram of a system architecture for parsing incoming utterances using multiple domains according to embodiments of the present disclosure.” The inputs to the NLU are Text data (300), Sensor data (302), Gesture data (304) and Time Data (306))  to an external system via a network  (C5: 3rd para: “A single server 120 may perform all speech/movement processing or multiple servers 120 may combine to perform all speech/movement processing. Further, the server(s) 120 may execute certain commands, such as answering or responding to spoken utterances of a user of the wearable device 110 and/or answering or responding to certain movements of the user. In addition, certain speech/motion detection or command execution functions may be performed by the voice input device 110.”) 
With respect to Claim 16 Grizzel teaches wherein extracting the one or more features of the audio signal includes identifying a plurality of tokens based on a speech portion of the audio signal (The audio data 111 of the in-ear object is further processed to generate Text data, C8 para 2: “The ASR component 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111.” The Text Data with its tokens now becomes part of the in-ear object.), wherein each token of the one or more tokens represents an utterance identified in the speech portion (C8 para 2: “The ASR component 250 transcribes the audio data 111 into text data representing words of speech contained in the audio data 111.” here each word of the text is considered a token.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 & 19 are rejected under 35 U.S.C. 103 as being unpatentable over Grizzel in view of Setlur (US RE38649 E) 
With respect to claim 5 & 19, Grizzel does not teach wherein the in-ear data object includes at least one metric representing a count of a set of words in utterances identified in the speech portion.  
Setlur teaches wherein the in-ear data object includes at least one metric representing a count of a set of words in utterances identified in the speech portion ( Col 5 para 2-3:  “Step 320 determines if a first word of the predetermined grammar has been recognized within the utterance. ... At step 324 a respective count of the number of words in the most likely word of sequences are made. In step 324 the decoding tree contents for the present frame and counts of the number of words of all the viable word sequences are examined.”)  
st paragraph)

Claims 8, 9, 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Grizzel in view of Sheynblat (US 20180113673 A1)
With respect to claim 8, Grizzel does not teach wherein the in-ear data object further includes location data indicating a geolocation of the in-ear device.  
Sheynblat teaches wherein the in-ear data object further includes location data indicating a geolocation of the in-ear device ([0020] “In addition to employing Bluetooth to identify nearby devices, the earbud also is equipped with a WiFi transceiver and/or a GPS or other type of location signal receiver and can determine its location based on received Bluetooth, WiFi, GPS, or other signals.”).  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Grizzel to include the teachings of Sheynblat, motivation being to over time, as new locations are learned these locations serve as contextual information for selecting devices that are commanded to perform an action ([0020], Sheynblat.)
With respect to claim 9, Grizzel does not teach wherein the in-ear data object further includes heartrate data indicating a heartrate of a user of the in-ear device.
Sheynblat teaches wherein the in-ear data object further includes heartrate data indicating a heartrate of a user of the in-ear device. ([0024] “Referring now to FIGS. 1A-1B, FIGS. 1A-1B show an example in-ear device 100 for control of remote devices ... These contact points may also provide means for measuring biometric parameters such as heart rate...”)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Grizzel to include the teachings of Sheynbla, motivation being 
With respect to claim 13, Grizzel does not teach wherein the external system is a user device associated with a user of the in-ear device, and wherein the user device utilizes the annotation object to train a machine-learned model that is personalized for the user.  
Sheynblat teaches the external system is a user device associated with a user of the in-ear device, and wherein the user device utilizes the annotation object to train a machine-learned model that is personalized for the user ([0062] “For example, the in-ear device may, in some examples, comprise one or more sensors, such as accelerometers...The processor 150 may receive one or more signals from one or more accelerometers, gyroscopes, or other sensors and, based on one or more machine learning techniques, verify that a movement or gait pattern, heart rate, or speaking patterns corresponds to a particular wearer....Suitable machine learning techniques may be trained explicitly by the wearer based on walking pace...”)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Grizzel to include the teachings of Sheynbla, motivation being to allow multi-factor authentication techniques by using suitable machine learning techniques trained explicitly to the user based on sensor data ([0062], Sheynblat.)
  With respect to claim 14, Grizzel does not teach wherein the user is identified by comparing the frequency spectrum of the in-ear data object with stored values attributed to the user.  
Sheynblat teaches wherein the user is identified by comparing the frequency spectrum of the in-ear data object with stored values attributed to the user ([0039] ”....By performing spectral analysis of the audio signals received from the microphones 120-122 and the signals received by the bone conduction microphone, the in-ear device 100 is able to determine that the audio signals and the signals from the conduction microphones correspond to the same sounds and thus the received audio signals were generated by the wearer.")  
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify the teachings of Grizzel to include the teachings of Sheynbla, motivation being to confirm identity based on audio signals and signals received via bone conduction ([0039] Sheynblat.)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675.  The examiner can normally be reached on Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/A.N.P./               Examiner, Art Unit 2657                                                                                                                                                                                           
/Paras D Shah/               Primary Examiner, Art Unit 2659                                                                                                                                                                                         

02/17/2021