DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-25 are pending under this Office action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-10, 12-18, and 20-25 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou, etc. (US 20200309930 A1) in view of Surace (US 20210082208 A1).
Regarding claim 1, Zhou teaches that an apparatus to detect a physical change in an environment (See Zhou: Fig. 23, and [0401], “FIG. 23 illustrates a system block diagram including constituent components of an example mobile device, in accordance with an embodiment of the acoustic-based echo-signature system, including an example computing system”), the apparatus comprising: 
a descriptor generator to generate a first descriptor, the descriptor generator including (See Zhou: Figs. 2A-F, and [0103], “During face registration 41, the system registers a user's face 9 using traditional image based face recognition methods. The registered image based profile can then be used for first round recognition, and also for retrieving a user's acoustic profile for a second level of verification. Respective facial fiducial points are extracted during step 44 using for example, existing algorithms. The system records the locations of such facial fiducial points using for example, a descriptor associated with the relative location/orientation between the smart device 7 and the user's face 9. Based on the locations of the fiducial points that are received by the system processor, the relative location between the face 9 and the camera is determined during step 45. A system processor, echo-signature processing device, echo-signature engine or processor, or a computing device associated with the echo-signature registration and/or authentication platform, can compute such values”): 
a chirp producer to emit a chirp into the environment (See Zhou: Fig. 3, and [0121], “In certain embodiments or aspects, as shown in FIG. 3, the earpiece speaker 101, top microphone 100, and frontal camera 103 are implemented individually or in combination for even more robust acoustic/visual sensing. The earpiece speaker 101 may selected for sound emitting for generally two reasons: 1) it is a design that exists on most smartphone devices. The location for the top microphone 100 is suitable for “illuminating” the user's face. Alternatively, the main speaker 104 comprises a more diverse design, either located at the bottom or on the back of the device 7; and 2) the earpiece speaker 101 is close to frontal camera 103, which minimizes alignment errors when the frontal camera is used for adjusting the phone pose relative to the user 3”; and [0245], “In the disclosed embodiment, the system measures the arrival time of each echo by a technique Frequency-Modulated Continuous Wave (FMCW) technique used in radars. In traditional FMCW, the speaker transmits continuous chirp signals with linear increasing frequency, from f.sub.min to f.sub.max. In order to estimate the distance from an object, FMCW compares the frequency of the echo signal to that of a reference signal using a technique called signal mixing, as shown in step 191, to find the frequency shift Δf (as shown in FIG. 6), which is proportional to the distance. Thus finding Δf provides the distance (i.e., Δf multiplying a constant coefficient)”); 
a chirp recorder to record a response to the chirp from the environment (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
a chirp response encoder to generate an encoding of the response to the chirp (See Zhou: Figs. 2A-F, and [0372], “The Response Delay was also evaluated. The response delay is the time needed for the system to produce an authentication result after the raw input signal is ready (referring to Table 4). Samsung S8 exhibits the least delay with an average of ˜15 ms, and the other two devices (Samsung S7 Edge, and Huawei P9) exhibit a delay of 32-45 ms. The delay approaches maximum when the user keeps moving the phone in seeking to align the face in the valid area, which incurs a lot of camera preview refreshing and rendering. The delay is generally also affected by other computation heavy background applications. For real-time continuous authentication, the delay between consecutive sound signal emitting is 50 ms. Preferably, in echo-signature system, authentication is performed every other instance of sound signal emitting, leaving sufficient time for processing”; and Fig. 21, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”); 
a descriptor similarity generator to generate a similarity value, the similarity value to compare the first descriptor to a second descriptor (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
a physical change indicator to, in response to the similarity value exceeding a similarity threshold (See Zhou: Fig. 20, and [0368], “User appearance changes such as wearing glasses and/or hats can cause changes in the reflected acoustic signals, thus generating more false negatives and low recall. In order to combat such problems, the SVM model was re-trained with data samples of new appearances in addition to the existing training data. FIG. 20 is a graphical representation in tabular format showing the average recall of 5 users with different appearance changes before/after model update using additional ˜1 minute's data. It is noted that without re-training, the recall values were reduced to single digits. After the re-training, the values increased back to normal levels, so correct users can pass easily. This indicates that re-training is effective at combating such changes”), indicate that a physical change has occurred in the environment.
However, Zhou fails to explicitly disclose that a physical change indicator to, in response to the similarity value exceeding a similarity threshold, indicate that a physical change has occurred in the environment.
However, Surace teaches that a physical change indicator to, in response to the similarity value exceeding a similarity threshold, indicate that a physical change has occurred in the environment (See Surace: Figs. 1-3, and [0076], “The suggestions may correspond to the one or more indicators (and/or correspond to the determination of the environment change and/or the vehicle parameter events). For instance, the suggestions may include (1) when the status message includes a structural indicator, a structural suggestion that may indicate the aircraft 131 may need maintenance and/or to be grounded immediately; (2) when the status message includes a battery indicator, a battery suggestion that may indicate the battery may need maintenance, cannot complete the next planned flight so the aircraft should be grounded, etc.; (3) when the status message includes an actuation system indicator, an actuation system suggestion that may indicate the actuation system 360 may need maintenance or the aircraft should be grounded immediately; (4) when the status message includes a flight path confirmation indicator, a flight path confirmation suggestion that may indicate that aircraft 131 is deviating significantly from the planned flight path 340 (e.g., due to weather, traffic, new obstacles, etc.); (5) when the status message includes a flight spacing indicator, a flight spacing suggestion that may indicate that the distance between the aircraft 131 should be increased or that the number of aircraft should be decreased for a given area/route 141; and (6) when the status message includes an environment change indicator, an environment change suggestion that may indicate that the obstacle information of the collective vehicle data is to be updated and the obstacle data of the obstacle database 356 on the one or more aircraft 131 is to be updated”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Zhou to have a physical change indicator to, in response to the similarity value exceeding a similarity threshold, indicate that a physical change has occurred in the environment as taught by Surace in order to check for sufficient vehicle structural integrity, sufficient power system reserves and health, sufficient vehicle spacing within traffic limitations, and obstacle avoidance (See Surace: [0018], “For instance, the system of the present disclosure may gather, store, and process vehicle data to ensure certification compliance and provide additional feedback to UAM operators. For instance, the system of the present disclosure may analyze the vehicle data and check for sufficient vehicle structural integrity, sufficient power system reserves and health, sufficient vehicle spacing within traffic limitations, and obstacle avoidance. As an example, the system of the present disclosure may inform operators of battery information (charge, discharge rate, health, etc.), health-of-vehicle information (structural or actuation systems), location history and flight plan tracking, etc. Furthermore, the system of the present disclosure can provide go versus no-go decisions to operators”). Zhou teaches a method and system that may identify persons based on the image features and the facial echo features of the person by eliminating the effect of changes in person’s pose, styles of hairs, color of clothing, etc.; while Surace teaches a system and method that may collect data, analyze the data, detect the event or environmental change, and display an indicator for the environmental change in order to avoid the obstacles. Therefore, it is obvious to one of ordinary skill in the art to modify Zhou by Surace to have an indicator to indicate the environmental physical change. The motivation to modify Zhou by Surace is “Use of known technique to improve similar devices (methods, or products) in the same way”.
Regarding claim 2, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou teaches that the apparatus of claim 1, wherein the chirp response encoder includes using an auto-encoder neural network to generate the encoding (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”).
Regarding claim 4, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou teaches that the apparatus of claim 1, wherein the chirp response encoder is further to emit a second chirp, record a second response to the second chirp from the environment, and generate a second encoding of the second response to the second chirp (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”; and [0123], “A Hanning window is applied to re-shape the pulse envelop in order to increase its peak-to-side lobe ratio, thereby producing higher SNR for echoes. In authentication modes that require continuous sound-emitting phase, a delay of 50 ms for each pulse may be implemented, such that echoes from two consecutive pulses do not overlap”). 
Regarding claim 5, Zhou and Surace teach all the features with respect to claim 4 as outlined above. Further, Zhou teaches that the apparatus of claim 4, wherein the response is a first response, the chirp is a first chirp, and wherein the chirp response encoder further includes a first auto-encoder neural network to encode the first response to the first chirp and a second auto-encoder neural network to encode the second response to the second chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”).
Regarding claim 6, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou teaches that the apparatus of claim 1, further including a compensation framing executor to execute a compensation framing process on the response to the chirp, the compensation framing to compensate for a distance between a source of the chirp and a location of the recording of the response to the chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”).
Regarding claim 7, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou teaches that the apparatus of claim 1, wherein the descriptor generator is further to determine whether a deviation of a first encoding of a first chirp response from a second encoding of a second chirp response exceeds a deviation threshold and, in response to determining that the deviation exceeds the deviation threshold, record a third chirp response and generate a third encoding, the third encoding corresponding to the third chirp response (See Zhou: Fig. 5, and [0132], “FIG. 5 is a graphical representation of distance measurements from acoustics, vision and calibrated acoustics. The dotted line in FIG. 5 shows the distance measurements from acoustics while the device 7 is being moved back and forth from the face 9. It can be observed that some outliers due to such “jumping” of the outliers 110 from the general grouping 111 of the acoustic signals. In order to solve this problem with “jumping”, a vision-aided major echo locating technique can be implemented comprising of two steps in certain disclosed embodiments”; and [0136], “A second step in accomplishing the removal of the outliers' problem is implementation of vision-aided major echo locating technique. Although vision based distance measurement is generally considered more stable than acoustics, vision based measurements cannot capture the error caused by rotations of either the smartphone device 7 or user's face 9. Thus, the vision calibrated distance measurement is used in certain embodiments, in order to narrow down the major echo searching range and reduce any respective outliers. The system still implements cross-correlation to find the exact major peak location within this range. However, that the device user 3 face 9 cannot rotate to extreme angles, otherwise facial landmark detection may fail”).
Regarding claim 8, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou and Surace teach that the apparatus of claim 1, wherein the descriptor generator is further to include a weather sampler to sample weather in the environment including a current time, a temperature value (See Zhou: Figs. 10A-B, and [0219], “When the algorithm searches the neighbors of any object by using Retrieve_Neighbors function, it takes into consideration both spatial and temporal neighborhoods. The non-spatial value of an object such as a temperature value is compared with the non-spatial values of spatial neighbors and also with the values of temporal neighbors (previous day in the same year, next day in the same year, and the same day in other years)”), a relative humidity value, and a pressure value (See Surace: Figs. 3A-B, and [0040], “The edge sensors 312 on the structures 346 of the aircraft 131 may be sensors to detect various environmental and/or system status information. For instance, some of the edge sensors 312 may monitor for discrete signals, such as edge sensors on seats (e.g., occupied or not), doors (e.g., closed or not), etc. of the aircraft 131. Some of the edge sensors 312 may monitor continuous signals, such as edge sensors on tires (e.g., tire pressure), brakes (e.g., engaged or not, amount of wear, etc.), passenger compartment (e.g., compartment air pressure, air composition, temperature, etc.), support structure (e.g., deformation, strain, etc.), etc., of the aircraft 131. The edge sensors 312 may transmit edge sensor data to the vehicle management computer 302 to report the discrete and/or continuous signals”). 
Regarding claim 9, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou and Surace teach that a method of detecting a physical change in an environment (See Zhou: Fig. 23, and [0401], “FIG. 23 illustrates a system block diagram including constituent components of an example mobile device, in accordance with an embodiment of the acoustic-based echo-signature system, including an example computing system”), the method comprising:
generating, by executing an instruction with a processor, a first descriptor, the generating including (See Zhou: Figs. 2A-F, and [0103], “During face registration 41, the system registers a user's face 9 using traditional image based face recognition methods. The registered image based profile can then be used for first round recognition, and also for retrieving a user's acoustic profile for a second level of verification. Respective facial fiducial points are extracted during step 44 using for example, existing algorithms. The system records the locations of such facial fiducial points using for example, a descriptor associated with the relative location/orientation between the smart device 7 and the user's face 9. Based on the locations of the fiducial points that are received by the system processor, the relative location between the face 9 and the camera is determined during step 45. A system processor, echo-signature processing device, echo-signature engine or processor, or a computing device associated with the echo-signature registration and/or authentication platform, can compute such values”):
emit a chirp into the environment (See Zhou: Fig. 3, and [0121], “In certain embodiments or aspects, as shown in FIG. 3, the earpiece speaker 101, top microphone 100, and frontal camera 103 are implemented individually or in combination for even more robust acoustic/visual sensing. The earpiece speaker 101 may selected for sound emitting for generally two reasons: 1) it is a design that exists on most smartphone devices. The location for the top microphone 100 is suitable for “illuminating” the user's face. Alternatively, the main speaker 104 comprises a more diverse design, either located at the bottom or on the back of the device 7; and 2) the earpiece speaker 101 is close to frontal camera 103, which minimizes alignment errors when the frontal camera is used for adjusting the phone pose relative to the user 3”; and [0245], “In the disclosed embodiment, the system measures the arrival time of each echo by a technique Frequency-Modulated Continuous Wave (FMCW) technique used in radars. In traditional FMCW, the speaker transmits continuous chirp signals with linear increasing frequency, from f.sub.min to f.sub.max. In order to estimate the distance from an object, FMCW compares the frequency of the echo signal to that of a reference signal using a technique called signal mixing, as shown in step 191, to find the frequency shift Δf (as shown in FIG. 6), which is proportional to the distance. Thus finding Δf provides the distance (i.e., Δf multiplying a constant coefficient)”); 
recording a response to the chirp from the environment (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
generating an encoding of the response to the chirp (See Zhou: Figs. 2A-F, and [0372], “The Response Delay was also evaluated. The response delay is the time needed for the system to produce an authentication result after the raw input signal is ready (referring to Table 4). Samsung S8 exhibits the least delay with an average of ˜15 ms, and the other two devices (Samsung S7 Edge, and Huawei P9) exhibit a delay of 32-45 ms. The delay approaches maximum when the user keeps moving the phone in seeking to align the face in the valid area, which incurs a lot of camera preview refreshing and rendering. The delay is generally also affected by other computation heavy background applications. For real-time continuous authentication, the delay between consecutive sound signal emitting is 50 ms. Preferably, in echo-signature system, authentication is performed every other instance of sound signal emitting, leaving sufficient time for processing”; and Fig. 21, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”); 
generating, by executing an instruction with the processor, a similarity value, the similarity value to compare the first descriptor to a second descriptor (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
in response to the similarity value exceeding a similarity threshold (See Zhou: Fig. 20, and [0368], “User appearance changes such as wearing glasses and/or hats can cause changes in the reflected acoustic signals, thus generating more false negatives and low recall. In order to combat such problems, the SVM model was re-trained with data samples of new appearances in addition to the existing training data. FIG. 20 is a graphical representation in tabular format showing the average recall of 5 users with different appearance changes before/after model update using additional ˜1 minute's data. It is noted that without re-training, the recall values were reduced to single digits. After the re-training, the values increased back to normal levels, so correct users can pass easily. This indicates that re-training is effective at combating such changes”), indicating, by executing an instruction with the processor, that a physical change has occurred in the environment (See Surace: Figs. 1-3, and [0076], “The suggestions may correspond to the one or more indicators (and/or correspond to the determination of the environment change and/or the vehicle parameter events). For instance, the suggestions may include (1) when the status message includes a structural indicator, a structural suggestion that may indicate the aircraft 131 may need maintenance and/or to be grounded immediately; (2) when the status message includes a battery indicator, a battery suggestion that may indicate the battery may need maintenance, cannot complete the next planned flight so the aircraft should be grounded, etc.; (3) when the status message includes an actuation system indicator, an actuation system suggestion that may indicate the actuation system 360 may need maintenance or the aircraft should be grounded immediately; (4) when the status message includes a flight path confirmation indicator, a flight path confirmation suggestion that may indicate that aircraft 131 is deviating significantly from the planned flight path 340 (e.g., due to weather, traffic, new obstacles, etc.); (5) when the status message includes a flight spacing indicator, a flight spacing suggestion that may indicate that the distance between the aircraft 131 should be increased or that the number of aircraft should be decreased for a given area/route 141; and (6) when the status message includes an environment change indicator, an environment change suggestion that may indicate that the obstacle information of the collective vehicle data is to be updated and the obstacle data of the obstacle database 356 on the one or more aircraft 131 is to be updated”).
Regarding claim 10, Zhou and Surace teach all the features with respect to claim 9 as outlined above. Further, Zhou teaches that the method of claim 9, wherein the generating an encoding includes using an auto-encoder neural network (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”).
Regarding claim 12, Zhou and Surace teach all the features with respect to claim 9 as outlined above. Further, Zhou teaches that the method of claim 9, wherein the generating an encoding includes emitting a second chirp, recording a second response to the second chirp from the environment, and generating a second encoding of the second response to the second chirp (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”; and [0123], “A Hanning window is applied to re-shape the pulse envelop in order to increase its peak-to-side lobe ratio, thereby producing higher SNR for echoes. In authentication modes that require continuous sound-emitting phase, a delay of 50 ms for each pulse may be implemented, such that echoes from two consecutive pulses do not overlap”). 
Regarding claim 13, Zhou and Surace teach all the features with respect to claim 12 as outlined above. Further, Zhou teaches that the method of claim 12, wherein the response is a first response, the chirp is a first chirp, and wherein the generating an encoding includes a first auto-encoder neural network to encode the first response to the first chirp and a second auto-encoder neural network to encode the second response to the second chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”). 
Regarding claim 14, Zhou and Surace teach all the features with respect to claim 9 as outlined above. Further, Zhou teaches that the method of claim 9, further including framing the response to the chirp, the framing to compensate for a distance between a source of the chirp and a location of the recording of the response to the chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”).
Regarding claim 15, Zhou and Surace teach all the features with respect to claim 9 as outlined above. Further, Zhou teaches that the method of claim 9, further including determining whether a deviation of a first encoding of a first chirp response from a second encoding of a second chirp response exceeds a deviation threshold and, in response to determining that the deviation exceeds the deviation threshold, recording a third chirp response and generate a third encoding, the third encoding corresponding to the third chirp response (See Zhou: Fig. 5, and [0132], “FIG. 5 is a graphical representation of distance measurements from acoustics, vision and calibrated acoustics. The dotted line in FIG. 5 shows the distance measurements from acoustics while the device 7 is being moved back and forth from the face 9. It can be observed that some outliers due to such “jumping” of the outliers 110 from the general grouping 111 of the acoustic signals. In order to solve this problem with “jumping”, a vision-aided major echo locating technique can be implemented comprising of two steps in certain disclosed embodiments”; and [0136], “A second step in accomplishing the removal of the outliers' problem is implementation of vision-aided major echo locating technique. Although vision based distance measurement is generally considered more stable than acoustics, vision based measurements cannot capture the error caused by rotations of either the smartphone device 7 or user's face 9. Thus, the vision calibrated distance measurement is used in certain embodiments, in order to narrow down the major echo searching range and reduce any respective outliers. The system still implements cross-correlation to find the exact major peak location within this range. However, that the device user 3 face 9 cannot rotate to extreme angles, otherwise facial landmark detection may fail”).
Regarding claim 16, Zhou and Surace teach all the features with respect to claim 9 as outlined above. Further, Zhou and Surace teach that the method of claim 9, wherein the generating a first descriptor further includes sampling weather in the environment, the sampling weather including sampling a current time, a temperature value (See Zhou: Figs. 10A-B, and [0219], “When the algorithm searches the neighbors of any object by using Retrieve_Neighbors function, it takes into consideration both spatial and temporal neighborhoods. The non-spatial value of an object such as a temperature value is compared with the non-spatial values of spatial neighbors and also with the values of temporal neighbors (previous day in the same year, next day in the same year, and the same day in other years)”), a relative humidity value, and a pressure value (See Surace: Figs. 3A-B, and [0040], “The edge sensors 312 on the structures 346 of the aircraft 131 may be sensors to detect various environmental and/or system status information. For instance, some of the edge sensors 312 may monitor for discrete signals, such as edge sensors on seats (e.g., occupied or not), doors (e.g., closed or not), etc. of the aircraft 131. Some of the edge sensors 312 may monitor continuous signals, such as edge sensors on tires (e.g., tire pressure), brakes (e.g., engaged or not, amount of wear, etc.), passenger compartment (e.g., compartment air pressure, air composition, temperature, etc.), support structure (e.g., deformation, strain, etc.), etc., of the aircraft 131. The edge sensors 312 may transmit edge sensor data to the vehicle management computer 302 to report the discrete and/or continuous signals”).
Regarding claim 17, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou and Surace teach that at least one non-transitory computer readable medium comprising instructions that, when executed (See Zhou: Fig. 23, and [0401], “FIG. 23 illustrates a system block diagram including constituent components of an example mobile device, in accordance with an embodiment of the acoustic-based echo-signature system, including an example computing system”), cause at least one processor to at least:
generate a first descriptor (See Zhou: Figs. 2A-F, and [0103], “During face registration 41, the system registers a user's face 9 using traditional image based face recognition methods. The registered image based profile can then be used for first round recognition, and also for retrieving a user's acoustic profile for a second level of verification. Respective facial fiducial points are extracted during step 44 using for example, existing algorithms. The system records the locations of such facial fiducial points using for example, a descriptor associated with the relative location/orientation between the smart device 7 and the user's face 9. Based on the locations of the fiducial points that are received by the system processor, the relative location between the face 9 and the camera is determined during step 45. A system processor, echo-signature processing device, echo-signature engine or processor, or a computing device associated with the echo-signature registration and/or authentication platform, can compute such values”);
cause a chirp to be emitted into an environment (See Zhou: Fig. 3, and [0121], “In certain embodiments or aspects, as shown in FIG. 3, the earpiece speaker 101, top microphone 100, and frontal camera 103 are implemented individually or in combination for even more robust acoustic/visual sensing. The earpiece speaker 101 may selected for sound emitting for generally two reasons: 1) it is a design that exists on most smartphone devices. The location for the top microphone 100 is suitable for “illuminating” the user's face. Alternatively, the main speaker 104 comprises a more diverse design, either located at the bottom or on the back of the device 7; and 2) the earpiece speaker 101 is close to frontal camera 103, which minimizes alignment errors when the frontal camera is used for adjusting the phone pose relative to the user 3”; and [0245], “In the disclosed embodiment, the system measures the arrival time of each echo by a technique Frequency-Modulated Continuous Wave (FMCW) technique used in radars. In traditional FMCW, the speaker transmits continuous chirp signals with linear increasing frequency, from f.sub.min to f.sub.max. In order to estimate the distance from an object, FMCW compares the frequency of the echo signal to that of a reference signal using a technique called signal mixing, as shown in step 191, to find the frequency shift Δf (as shown in FIG. 6), which is proportional to the distance. Thus finding Δf provides the distance (i.e., Δf multiplying a constant coefficient)”);
record a response to the chirp from the environment (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”);
generate an encoding of the response to the chirp (See Zhou: Figs. 2A-F, and [0372], “The Response Delay was also evaluated. The response delay is the time needed for the system to produce an authentication result after the raw input signal is ready (referring to Table 4). Samsung S8 exhibits the least delay with an average of ˜15 ms, and the other two devices (Samsung S7 Edge, and Huawei P9) exhibit a delay of 32-45 ms. The delay approaches maximum when the user keeps moving the phone in seeking to align the face in the valid area, which incurs a lot of camera preview refreshing and rendering. The delay is generally also affected by other computation heavy background applications. For real-time continuous authentication, the delay between consecutive sound signal emitting is 50 ms. Preferably, in echo-signature system, authentication is performed every other instance of sound signal emitting, leaving sufficient time for processing”; and Fig. 21, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”);
generate a similarity value, the similarity value to compare the first descriptor to a second descriptor (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and
in response to the similarity value exceeding a similarity threshold (See Zhou: Fig. 20, and [0368], “User appearance changes such as wearing glasses and/or hats can cause changes in the reflected acoustic signals, thus generating more false negatives and low recall. In order to combat such problems, the SVM model was re-trained with data samples of new appearances in addition to the existing training data. FIG. 20 is a graphical representation in tabular format showing the average recall of 5 users with different appearance changes before/after model update using additional ˜1 minute's data. It is noted that without re-training, the recall values were reduced to single digits. After the re-training, the values increased back to normal levels, so correct users can pass easily. This indicates that re-training is effective at combating such changes”), indicate that a physical change has occurred in the environment (See Surace: Figs. 1-3, and [0076], “The suggestions may correspond to the one or more indicators (and/or correspond to the determination of the environment change and/or the vehicle parameter events). For instance, the suggestions may include (1) when the status message includes a structural indicator, a structural suggestion that may indicate the aircraft 131 may need maintenance and/or to be grounded immediately; (2) when the status message includes a battery indicator, a battery suggestion that may indicate the battery may need maintenance, cannot complete the next planned flight so the aircraft should be grounded, etc.; (3) when the status message includes an actuation system indicator, an actuation system suggestion that may indicate the actuation system 360 may need maintenance or the aircraft should be grounded immediately; (4) when the status message includes a flight path confirmation indicator, a flight path confirmation suggestion that may indicate that aircraft 131 is deviating significantly from the planned flight path 340 (e.g., due to weather, traffic, new obstacles, etc.); (5) when the status message includes a flight spacing indicator, a flight spacing suggestion that may indicate that the distance between the aircraft 131 should be increased or that the number of aircraft should be decreased for a given area/route 141; and (6) when the status message includes an environment change indicator, an environment change suggestion that may indicate that the obstacle information of the collective vehicle data is to be updated and the obstacle data of the obstacle database 356 on the one or more aircraft 131 is to be updated”).
Regarding claim 18, Zhou and Surace teach all the features with respect to claim 17 as outlined above. Further, Zhou teaches that the at least one non-transitory computer readable medium of claim 17, wherein the instructions, when executed, cause the at least one processor to generate an encoding including using an auto-encoder neural network (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”).
Regarding claim 20, Zhou and Surace teach all the features with respect to claim 17 as outlined above. Further, Zhou teaches that the at least one non-transitory computer readable medium of claim 17, wherein the instructions, when executed, cause the at least one processor to cause a second chirp to be emitted into the environment, record a second response to the second chirp from the environment, and generate a second encoding of the second response to the second chirp (See Zhou: Fig. 22, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”; and [0123], “A Hanning window is applied to re-shape the pulse envelop in order to increase its peak-to-side lobe ratio, thereby producing higher SNR for echoes. In authentication modes that require continuous sound-emitting phase, a delay of 50 ms for each pulse may be implemented, such that echoes from two consecutive pulses do not overlap”).
Regarding claim 21, Zhou and Surace teach all the features with respect to claim 20 as outlined above. Further, Zhou teaches that the at least one non-transitory computer readable medium of claim 20, wherein the response is a first response, the chirp is a first chirp, and wherein the instructions, when executed, cause the at least one processor to use a first auto-encoder neural network to encode the first response to the first chirp and a second auto-encoder neural network to encode the second response to the second chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”).
Regarding claim 22, Zhou and Surace teach all the features with respect to claim 17 as outlined above. Further, Zhou teaches that the at least one non-transitory computer readable medium of claim 17, wherein the instructions, when executed, further cause the at least one processor to execute a compensation framing process on the response to the chirp, the compensation framing to compensate for a distance between a source of the chirp and a location of the recording of the response to the chirp (See Zhou: Fig. 4, and [0125], “A graphical representation of a sample recording segment of a received signal after noise removal is shown in FIG. 4. The direct path segment is defined as the emitting signal traveling from speaker to the microphone directly, which ideally should be a copy of the emitting signal and exhibits the highest amplitude, in certain embodiments. The major echo corresponds to the mix of echoes from the major surfaces (e.g., cheek, forehead) of the face. Other surfaces of the face (e.g., nose, chin) at different distances to the phone also produce echoes, arriving earlier/later than the major echo. The face region echoes include all these echoes, capturing the full information of the face. Accurate segmenting of the face region echoes is critical to minimize the disturbances from dynamic clutters around the phone, and reduce the data dimension for model training and performance”).
Regarding claim 23, Zhou and Surace teach all the features with respect to claim 17 as outlined above. Further, Zhou teaches that the at least one non-transitory computer readable medium of claim 17, wherein the instructions, when executed, further cause the at least one processor to determine whether a deviation of a first encoding of a first chirp response from a second encoding of a second chirp response exceeds a deviation threshold and, in response to determining that the deviation exceeds the deviation threshold, to record a third chirp response and generate a third encoding, the third encoding corresponding to the third chirp response (See Zhou: Fig. 5, and [0132], “FIG. 5 is a graphical representation of distance measurements from acoustics, vision and calibrated acoustics. The dotted line in FIG. 5 shows the distance measurements from acoustics while the device 7 is being moved back and forth from the face 9. It can be observed that some outliers due to such “jumping” of the outliers 110 from the general grouping 111 of the acoustic signals. In order to solve this problem with “jumping”, a vision-aided major echo locating technique can be implemented comprising of two steps in certain disclosed embodiments”; and [0136], “A second step in accomplishing the removal of the outliers' problem is implementation of vision-aided major echo locating technique. Although vision based distance measurement is generally considered more stable than acoustics, vision based measurements cannot capture the error caused by rotations of either the smartphone device 7 or user's face 9. Thus, the vision calibrated distance measurement is used in certain embodiments, in order to narrow down the major echo searching range and reduce any respective outliers. The system still implements cross-correlation to find the exact major peak location within this range. However, that the device user 3 face 9 cannot rotate to extreme angles, otherwise facial landmark detection may fail”).
Regarding claim 24, Zhou and Surace teach all the features with respect to claim 17 as outlined above. Further, Zhou and Surace teach that the at least one non-transitory computer readable medium of claim 17, wherein the instructions, when executed, further cause the at least one processor to sample weather in the environment including a current time, a temperature value (See Zhou: Figs. 10A-B, and [0219], “When the algorithm searches the neighbors of any object by using Retrieve_Neighbors function, it takes into consideration both spatial and temporal neighborhoods. The non-spatial value of an object such as a temperature value is compared with the non-spatial values of spatial neighbors and also with the values of temporal neighbors (previous day in the same year, next day in the same year, and the same day in other years)”), a relative humidity value, and a pressure value (See Surace: Figs. 3A-B, and [0040], “The edge sensors 312 on the structures 346 of the aircraft 131 may be sensors to detect various environmental and/or system status information. For instance, some of the edge sensors 312 may monitor for discrete signals, such as edge sensors on seats (e.g., occupied or not), doors (e.g., closed or not), etc. of the aircraft 131. Some of the edge sensors 312 may monitor continuous signals, such as edge sensors on tires (e.g., tire pressure), brakes (e.g., engaged or not, amount of wear, etc.), passenger compartment (e.g., compartment air pressure, air composition, temperature, etc.), support structure (e.g., deformation, strain, etc.), etc., of the aircraft 131. The edge sensors 312 may transmit edge sensor data to the vehicle management computer 302 to report the discrete and/or continuous signals”).
Regarding claim 25, Zhou and Surace teach all the features with respect to claim 1 as outlined above. Further, Zhou and Surace teach that an apparatus to detect a physical change in an environment (See Zhou: Fig. 23, and [0401], “FIG. 23 illustrates a system block diagram including constituent components of an example mobile device, in accordance with an embodiment of the acoustic-based echo-signature system, including an example computing system”), the apparatus comprising: 
means for generating to generate a first descriptor, the means for generating including (See Zhou: Figs. 2A-F, and [0103], “During face registration 41, the system registers a user's face 9 using traditional image based face recognition methods. The registered image based profile can then be used for first round recognition, and also for retrieving a user's acoustic profile for a second level of verification. Respective facial fiducial points are extracted during step 44 using for example, existing algorithms. The system records the locations of such facial fiducial points using for example, a descriptor associated with the relative location/orientation between the smart device 7 and the user's face 9. Based on the locations of the fiducial points that are received by the system processor, the relative location between the face 9 and the camera is determined during step 45. A system processor, echo-signature processing device, echo-signature engine or processor, or a computing device associated with the echo-signature registration and/or authentication platform, can compute such values”): 
means for causing to cause a chirp to be emitted into the environment (See Zhou: Fig. 3, and [0121], “In certain embodiments or aspects, as shown in FIG. 3, the earpiece speaker 101, top microphone 100, and frontal camera 103 are implemented individually or in combination for even more robust acoustic/visual sensing. The earpiece speaker 101 may selected for sound emitting for generally two reasons: 1) it is a design that exists on most smartphone devices. The location for the top microphone 100 is suitable for “illuminating” the user's face. Alternatively, the main speaker 104 comprises a more diverse design, either located at the bottom or on the back of the device 7; and 2) the earpiece speaker 101 is close to frontal camera 103, which minimizes alignment errors when the frontal camera is used for adjusting the phone pose relative to the user 3”; and [0245], “In the disclosed embodiment, the system measures the arrival time of each echo by a technique Frequency-Modulated Continuous Wave (FMCW) technique used in radars. In traditional FMCW, the speaker transmits continuous chirp signals with linear increasing frequency, from f.sub.min to f.sub.max. In order to estimate the distance from an object, FMCW compares the frequency of the echo signal to that of a reference signal using a technique called signal mixing, as shown in step 191, to find the frequency shift Δf (as shown in FIG. 6), which is proportional to the distance. Thus finding Δf provides the distance (i.e., Δf multiplying a constant coefficient)”); 
means for recording to record a response to the chirp from the environment (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
means for encoding to generate an encoding of the response to the chirp (See Zhou: Figs. 2A-F, and [0372], “The Response Delay was also evaluated. The response delay is the time needed for the system to produce an authentication result after the raw input signal is ready (referring to Table 4). Samsung S8 exhibits the least delay with an average of ˜15 ms, and the other two devices (Samsung S7 Edge, and Huawei P9) exhibit a delay of 32-45 ms. The delay approaches maximum when the user keeps moving the phone in seeking to align the face in the valid area, which incurs a lot of camera preview refreshing and rendering. The delay is generally also affected by other computation heavy background applications. For real-time continuous authentication, the delay between consecutive sound signal emitting is 50 ms. Preferably, in echo-signature system, authentication is performed every other instance of sound signal emitting, leaving sufficient time for processing”; and Fig. 21, and [0392], “Input/output circuitry 335 may be operative to convert (and encode/decode, if necessary) analog signals and other signals into digital data. In some embodiments, input/output circuitry can also convert digital data into any other type of signal, and vice-versa. For example, input/output circuitry 335 may receive and convert physical contact inputs (e.g., from a multi-touch screen), physical movements (e.g., from a mouse or sensor), analog audio signals (e.g., from a microphone), or any other input. The digital data can be provided to and received from processor 331, storage 332, memory 333, or any other component of electronic device 330. Although input/output circuitry 335 is illustrated in FIG. 22 as a single component of electronic device 330, several instances of input/output circuitry can be included in electronic device 330”); 
means for comparing to generate a similarity value, the similarity value to compare the first descriptor to a second descriptor (See Zhou: Figs. 2A-F, and [0106], “During an example user authentication process as shown for example in FIG. 2C, users would need to pass both traditional face recognition and acoustic verification process in order to attain or achieve system access to the device 7. First the user's face 9 is compared to the image profiles in the database for pre-screening, following for example, traditional face recognition methods. If a matched profile is found in step 49, this will trigger the face fiducial points detection and acoustic sensing module, which finds the relative location between face and camera, emits a designed signal and records the reflection signal. Then, a system algorithm extracts features from the reflection signals and matches the features given the relative location. This can be achieved by computing similarity metrics using correlation or machine learning regression algorithms. If the similarity is above a certain threshold, (for example, 75%, 85%, 95% similarity to) authentication is approved during authentication phase 53. Otherwise, user access is denied in step 61”); and 
means for indicating to indicate that a physical change has occurred in the environment (See Surace: Figs. 1-3, and [0076], “The suggestions may correspond to the one or more indicators (and/or correspond to the determination of the environment change and/or the vehicle parameter events). For instance, the suggestions may include (1) when the status message includes a structural indicator, a structural suggestion that may indicate the aircraft 131 may need maintenance and/or to be grounded immediately; (2) when the status message includes a battery indicator, a battery suggestion that may indicate the battery may need maintenance, cannot complete the next planned flight so the aircraft should be grounded, etc.; (3) when the status message includes an actuation system indicator, an actuation system suggestion that may indicate the actuation system 360 may need maintenance or the aircraft should be grounded immediately; (4) when the status message includes a flight path confirmation indicator, a flight path confirmation suggestion that may indicate that aircraft 131 is deviating significantly from the planned flight path 340 (e.g., due to weather, traffic, new obstacles, etc.); (5) when the status message includes a flight spacing indicator, a flight spacing suggestion that may indicate that the distance between the aircraft 131 should be increased or that the number of aircraft should be decreased for a given area/route 141; and (6) when the status message includes an environment change indicator, an environment change suggestion that may indicate that the obstacle information of the collective vehicle data is to be updated and the obstacle data of the obstacle database 356 on the one or more aircraft 131 is to be updated”).


Claims 3, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou, etc. (US 20200309930 A1) in view of Surace (US 20210082208 A1), further in view of Hao, etc. (US 20200082245 A1).
Regarding claim 3, Zhou and Surace teach all the features with respect to claim 2 as outlined above. However, Zhou fails to explicitly disclose that the apparatus of claim 2, wherein the auto-encoder neural network has an undercomplete topology.
However, Hao teaches that the apparatus of claim 2, wherein the auto-encoder neural network has an undercomplete topology (See Hao: Fig. 3, and [0046], “FIG. 3 illustrates a deep auto-encoder 300 to be employed in embodiments of the present disclosure. The input data 302 to the deep auto-encoder 300 comprises time slices of the sensor traces derived from a matrix representation of a plurality of time-series traces of a plurality of sensors associated with the manufacturing tools 101. The deep auto-encoder 300 may comprise an input layer 304, one or more hidden layers 306, a central bottleneck layer 308, an output layer 310, and a full set of connections 312 between the layers. The structure of the hidden layers 306 is symmetric with respect to the bottleneck layer 308, which has the smallest number of nodes. The bottleneck layer 308 is employed to help the neural network 300 find the minimal representation of the input data 302 reconstructed to the output data 314 by extracting a limited number of features that represent the input data 302. A designer only needs to define the number of layers in the deep auto-encoder 300, and how many nodes there are to be in each of the layers. The deep auto-encoder 300 is trained with example traces having no anomalies and is configured to produce output data 314 that is a reconstruction of the plurality of traces corresponding to the input data 302, wherein the output data 314 has minimized reconstruction error (e.g., the mean squared error, or MSE) relative to the input traces. The reconstruction error is minimized for a minimum set of the global and time invariant features learns by the deep-auto encoder 300 during training necessary to reproduce the input sensor traces”).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Zhou to have the apparatus of claim 2, wherein the auto-encoder neural network has an undercomplete topology as taught by Hao in order to improve efficiency, and allow for better strategic planning with respect to the manufacturing process (See Hao: [0086], “The techniques allow for intelligent predictions of substrate quality based on manufacturing data, and allow for efficient decisions to be made regarding corrective actions to be taken with respect to individual substrates and other aspects of the manufacturing process. Use of embodiments of the present disclosure may reduce costs, improve efficiency, and allow for better strategic planning with respect to the manufacturing process”). Zhou teaches a method and system that may identify persons with a neural network based on the image features and the facial echo features of the person by eliminating the effect of changes in person’s pose, styles of hairs, color of clothing, etc.; while Hao teaches a system and method that may derive a model of the training time-series traces with undercomplete topology neural network to minimize the reconstruction error of the training time-series traces. Therefore, it is obvious to one of ordinary skill in the art to modify Zhou by Hao to have a undercomplete topology neural network to identify persons based on the audio and image input signals in order to improve efficiency. The motivation to modify Zhou by Hao is “Simple substitution of one known element for another to obtain predictable results”.
Regarding claim 11, Zhou and Surace teach all the features with respect to claim 10 as outlined above. Further, Hao teaches that the method of claim 10, wherein the auto-encoder neural network has an undercomplete topology (See Hao: Fig. 3, and [0046], “FIG. 3 illustrates a deep auto-encoder 300 to be employed in embodiments of the present disclosure. The input data 302 to the deep auto-encoder 300 comprises time slices of the sensor traces derived from a matrix representation of a plurality of time-series traces of a plurality of sensors associated with the manufacturing tools 101. The deep auto-encoder 300 may comprise an input layer 304, one or more hidden layers 306, a central bottleneck layer 308, an output layer 310, and a full set of connections 312 between the layers. The structure of the hidden layers 306 is symmetric with respect to the bottleneck layer 308, which has the smallest number of nodes. The bottleneck layer 308 is employed to help the neural network 300 find the minimal representation of the input data 302 reconstructed to the output data 314 by extracting a limited number of features that represent the input data 302. A designer only needs to define the number of layers in the deep auto-encoder 300, and how many nodes there are to be in each of the layers. The deep auto-encoder 300 is trained with example traces having no anomalies and is configured to produce output data 314 that is a reconstruction of the plurality of traces corresponding to the input data 302, wherein the output data 314 has minimized reconstruction error (e.g., the mean squared error, or MSE) relative to the input traces. The reconstruction error is minimized for a minimum set of the global and time invariant features learns by the deep-auto encoder 300 during training necessary to reproduce the input sensor traces”).
Regarding claim 19, Zhou and Surace teach all the features with respect to claim 18 as outlined above. Further, Hao teaches that the at least one non-transitory computer readable medium of claim 18, wherein the auto-encoder neural network has an undercomplete topology (See Hao: Fig. 3, and [0046], “FIG. 3 illustrates a deep auto-encoder 300 to be employed in embodiments of the present disclosure. The input data 302 to the deep auto-encoder 300 comprises time slices of the sensor traces derived from a matrix representation of a plurality of time-series traces of a plurality of sensors associated with the manufacturing tools 101. The deep auto-encoder 300 may comprise an input layer 304, one or more hidden layers 306, a central bottleneck layer 308, an output layer 310, and a full set of connections 312 between the layers. The structure of the hidden layers 306 is symmetric with respect to the bottleneck layer 308, which has the smallest number of nodes. The bottleneck layer 308 is employed to help the neural network 300 find the minimal representation of the input data 302 reconstructed to the output data 314 by extracting a limited number of features that represent the input data 302. A designer only needs to define the number of layers in the deep auto-encoder 300, and how many nodes there are to be in each of the layers. The deep auto-encoder 300 is trained with example traces having no anomalies and is configured to produce output data 314 that is a reconstruction of the plurality of traces corresponding to the input data 302, wherein the output data 314 has minimized reconstruction error (e.g., the mean squared error, or MSE) relative to the input traces. The reconstruction error is minimized for a minimum set of the global and time invariant features learns by the deep-auto encoder 300 during training necessary to reproduce the input sensor traces”).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/             Primary Examiner, Art Unit 2612