DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Specification
Specification fails to disclose the claimed feature “the convolutional neural network layer, the attention layer, and the recursive neural network layer trained to generate the acoustic classification using backpropagation” as recited in claim 8 and instead, the specification merely and broadly reads “Optimization (e.g., backpropagation, network training) may occur over the fully connected layer 510 and the SoftMax layer 515 to maximize the likelihood of generating an accurate summary vector (e.g., enhanced characterization data) (USPGPub US 20210082453 A1, para 48)” which does not support the claimed features above: backpropagation is also used in training “attention layer” and “convolution layer”.
Appropriate correction is required.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the “the convolutional neural network layer, the attention layer, and the recursive neural network layer trained to generate the acoustic classification using backpropagation” as recited in claim , must be shown or the feature(s) canceled from the claim.  No new matter should be entered. Note: it is well-known in the art that backpropagation is used in recursive neural network or LSTM-based bidirectional neural network, but rare in attention layer, convolutional neural network layer and the specification fails to disclose how backpropagation is also used in “convolutional layer” and “attention layer” and see the specification objection as set forth above above.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the 
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Claims 1-20 rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-6, 9-13, 15-16, 25 of U.S. Patent No. 10878837 B1. Although the conflicting claims are not identical, they are not patentably distinct from each other because the claims of the instant application is a broader version of the claims of U.S. Patent No. 10878837 B1. The following is the comparison between claims of the instant application and the conflicting claims of the U.S. Patent No. ?:

Conflicting claim(s) in U.S. Patent No. 10878837 B1
1. A method comprising: identifying sound recording data on a user device; generating, by the user device, an acoustic classification using from the sound recording data using an acoustic classification neural network, the acoustic classification neural network comprising a convolutional neural network layer to generate image values that are weighted by an attention layer that updates a recursive neural network layer; and storing the acoustic classification on the user device.

3. The method of claim 1, wherein the recursive neural network layer processes data from the convolutional neural network layer over time steps of recursive neural network layer.

4. The method of claim 3, wherein feature data output by the convolutional neural network layer is weighted by the attention layer for each of the time steps, and wherein the data processed by the recursive neural network layer is the weighted feature data.


















2. The method of claim 1, wherein the attention layer is a fully connected neural network layer.

5. The method of claim 1, further comprising: generating an audio recording of a physical environment using a sound sensor of the user device.

6. The method of claim 5, further comprising: converting the audio recording into the sound recording data of the physical environment.

7. The method of claim 6, wherein the audio recording is in a non-visual image format and the sound recording data is a visual image format.

8. The method of claim 1, wherein the convolutional neural network layer, the attention layer, and the recursive neural network layer trained to generate the acoustic classification using backpropagation.



9. The method of claim 8, wherein the acoustic classification neural network is trained on training audio from one or more different environments including at least an outdoor environment and an indoor environment.




11. The method of claim 10, wherein the classification layer outputs a numerical value for each a plurality of scene categories, the numerical value indicating a likelihood that the acoustic classification is of a given scene category from the plurality of scene categories.

12. The method of claim 11, wherein the plurality of scene categories include one or more of a group comprising: a bus, a cafe, a car, a city center, a forest, a grocery store, a home, a lakeside beach, a library, a railway station, an office, a residential area, a train, a tram, and an urban park.

13. The method of claim 1, wherein the sound recording data is a spectrogram comprising frequency on a vertical axis and time on a horizontal axis.

14. A system comprising: one or more processors of a machine; and a memory comprising instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: identifying sound recording data; generating an acoustic classification using from the sound recording data using an acoustic classification neural network, the acoustic classification neural network comprising a convolutional neural network layer to generate image values that are weighted by an attention layer that updates a recursive 

16. The system of claim 14, wherein the recursive neural network layer processes data from the convolutional neural network layer over time steps of recursive neural network layer.

17. The system of claim 16, wherein feature data output by the convolutional neural network layer is weighted by the attention layer for each of the time steps, and wherein the data processed by the recursive neural network layer is the weighted feature data.

















15. The system of claim 14, wherein the attention layer is a fully connected neural network layer.

18. The system of claim 14, the operations further comprising: generating an audio recording of a physical environment using a sound sensor.

19. A non-transitory computer readable storage medium comprising instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising: identifying sound recording data; generating an acoustic classification using from the sound recording data using an acoustic classification neural network, the acoustic classification neural network comprising a convolutional neural network layer to generate image values that are weighted by an attention layer that updates a recursive neural network layer; and storing the acoustic classification.
































20. The non-transitory computer readable storage medium of claim 19, wherein the attention layer is a fully connected neural network layer. 


3. The method of claim 1, wherein the attention neural network layer is a fully connected neural network layer.

4. The method of claim 1, further comprising: generating an audio recording of the physical environment using a sound sensor.


5. The method of claim 4, further comprising: converting the audio recording into the sound recording spectrogram data of the physical environment.

6. The method of claim 5, wherein the audio recording is in a non-visual image format and the sound recording spectrogram data is a visual image format.

2. The method of claim 1, wherein the convolutional neural network layer, the bi-directional LSTM neural network layer, the attention neural network layer, and the deep neural network layer are trained as acoustic classification neural network using backpropagation.

7. The method of claim 1, further comprising: training the convolutional neural network layer, the bi-directional LSTM neural network layer, and the deep neural network layer as a network using training audio data recorded from one or more different environments including at least an outdoor environment and an indoor environment.



10. The method of claim 9, wherein the classification layer outputs a numerical score for each of the one or more scene categories, the numerical score indicating a likelihood that the physical environment is of a given scene category from the one or more scene categories.

11. The method of claim 8, wherein the one or more scene categories include one or more of a group comprising: a bus, a cafe, a car, a city center, a forest, a grocery store, a home, a lakeside beach, a library, a railway station, an office, a residential area, a train, a tram, and an urban park.

12. The method of claim 1, wherein the sound recording spectrogram data is a spectrogram comprising frequency on a vertical axis and time on a horizontal axis.

13. A system comprising: one or more processors of a machine; and a memory comprising instructions that, when executed by the one or more processors, cause the machine to perform operations comprising: identifying sound recording spectrogram data of a physical environment; generating, using a convolutional neural network layer of an acoustic classification neural network, sound recording feature data from the sound recording spectrogram data, the acoustic classification neural network comprising the convolutional neural network layer that inputs to a bi-directional long short-term 

15. The system of claim 14, wherein the attention neural network layer is a fully connected neural network layer.

16. The system of claim 13, the operations further comprising: generating an audio recording of the physical environment using a sound sensor.

25. A non-transitory computer readable storage medium comprising instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising: identifying sound recording spectrogram data of a physical environment; generating, using a convolutional neural network layer of an acoustic classification neural network, sound recording feature data from the sound recording spectrogram data, the acoustic classification neural network comprising the convolutional neural network layer that inputs to a bi-directional long short-term memory (LSTM) neural network layer and an attention neural network layer, the bi-directional LSTM neural network layer and the attention neural network layer configured to input to a deep neural network layer to generate acoustic classifications; generating, by the bi-directional LSTM neural network layer, sound recording characterization data for each of a plurality of LSTM time steps by recursively processing each sound recording feature data-for each LSTM time step using the bi-directional LSTM neural network layer; generating, using an attention fully connected neural network layer, attention data for each of the plurality of LSTM time steps, the attention data generated by inputting the sound recording feature data from the convolutional neural network layer into the attention neural network layer; generating enhanced sound recording characterization data by combining, for each LSTM time step, the sound recording characterization data with the attention data; generating, using a deep neural network layer of the acoustic 

15. The system of claim 14, wherein the attention neural network layer is a fully connected neural network layer.




Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 1 recites “generating, by the user device, an acoustic classification using from the sound recording data using an acoustic classification neural network, …” which is confusing about “using from” because it is unclear whether “generating …” is by “using” or “from”“the sound recording data”, and thus, renders claim indefinite. Claim 1 further recites “the acoustic classification neural network comprising a convolutional neural network layer to generate image values” which is further confusing because it is unclear whether “image values” is generated from “a convolutional neural network layer” or from “acoustic classification neural Claims 2-13 are rejected due to the dependencies to claim 1.
Claim 14 is rejected for the at least similar reasons described in claim 1 above since claim 14 recited the similar deficient features as recited in claim 1. Claims 15-18 are rejected due to the dependencies to claim 14.
Claim 19 is rejected for the at least similar reasons described in claim 1 above since claim 19 recited the similar deficient features as recited in claim 1. Claim 20 is rejected due to the dependencies to claim 19.
Claim 4 further recites “feature data output by the convolutional neural network layer” and the parent claim 3 recites “data from the convolutional neural network layer”, and the parent claim 1 recites “a convolutional neural network layer to generate image values”, which is further confusing because it is unclear what is generated from “convolutional neural network layer” and it is unclear whether “image values”, “data”, or “feature data” is generated by “convolutional neural network layer” and thus, further renders claim indefinite. Claim 4 further recites “the data processed by the recursive neural network layer” and “the weighted feature data” and wherein the term “the data” and “the weighted feature data” have insufficient antecedent bases for the limitation in claim 4, which is confusing because it is unclear what “the data” is and what “the weighted feature data” is and how processing “the data” by the recursive neural network layer” is performed, and thus, further renders claim indefinite.
Claim 11 further recites “for each a plurality of scene categories” which is confusing because it is unclear whether “each plurality of scene categories” or “a plurality of scene Claim 12 is rejected due to the dependency to claim 11.
Claim 17 is rejected for the at least similar reason as described in claim 4 above because claim 17 recites the similar deficient feature as recited in claim 4.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kates (US 20100027820 A1) and in view of reference Catanzaro et al (US 20170148431 A1, hereinafter Catanzaro).
Claim 1: Kates teaches a method (title and abstract, ln 1-5 and a system in fig. 1 and a method, abstract) comprising:
identifying sound recording data on a user device (a hearing aid in fig. 1 and the environment sound picked up by microphones 12, 16, para 47);
generating, by the user device, an acoustic classification (the microphone signal being classified into music, speech, or noise, etc., in fig. 7; by sound environment detector 26, hearing aid processor 20, etc., in fig. 1) using from the sound recording data (the audio data from the 
storing the acoustic classification on the user device (the classification results are placed in the hearing aid for parameter map to select appropriate algorithm executed on the hearing aid processor, para 47, para 53).
However, Kates does not explicitly teach wherein the acoustic classification neural network comprising a convolutional neural network layer to generate image values that are weighted by an attention layer that updates a recursive neural network layer.
Catanzaro teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-12 and fig. 1) and wherein a acoustic classification neural network is disclosed (including RNN model by inputting speech spectrograms 105, para 51; generating classification result, para 191) to comprise a convolutional neural network layer (including a row convolution layer following the recurrent layer in fig. 7, para 101; including 1D or 2D invariant convolution 110 in fig. 1) to generate image values (output from the row convolution layer in fig. 7 or output to the recurrent or GRU bidirectionaly layers 115 in fig. 1) that are weighted by an attention layer (one or more fully connected layers 120 as the claimed attention layer are applied to the previous layer L-1 with 
    PNG
    media_image1.png
    26
    129
    media_image1.png
    Greyscale
, para 60, i.e., an attention layer added to the decoder, para 44) that updates a recursive neural network layer (hl-1t is the row convolution layer as one of the recursive neural network layer, which is weighted through the equation hlt above, and further network parameters through the backpropagation through time algorithm, para 61-63, i.e., a recursive neural network layer is updated by 
    PNG
    media_image1.png
    26
    129
    media_image1.png
    Greyscale
and backpropagation through time algorithm, para 61-63) for benefits of improving performance of the system 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the acoustic classification neural network comprising the convolutional neural network layer to generate the image values that are weighted by the attention layer that updates the recursive neural network layer, as taught by Catanzaro, to the acoustic classification neural network in the method, as taught by Kates, for the benefits discussed above.
Claim 14 has been analyzed and rejected according to claim 1 above and the combination of Kates and Catanzaro further teaches wherein one or more processors of a machine (Kates, the hearing aid as a machine, a processor 20 in fig. 1, para 20 and Catanzaro, such as smart devices as a machine, para 6, including GPU or more processors, para 124-125); and a memory comprising instructions (Kates, corresponding algorithm in non-volatile memory, para 5, and a method algorithm stored in a tangible computer-readable medium, para 29, such as fast shared memory, para 139) of the method of claim 1 that, when executed by the one or more processors (Kates, predetermined signal processing algorithm, para 46 and Catanzaro, programs to implement method steps, para 225).
Claim 19 has been analyzed and rejected according to claims 1, 14 above.
Claim 2: the combination of Kates and Catanzaro further teaches, according to claim 1 above, wherein the attention layer is a fully connected neural network layer (Catanzaro, fully connected neural network layer 120 in fig. 1, para 51).
Claim 3: the combination of Kates and Catanzaro further teaches, according to claim 1 above, wherein the recursive neural network layer processes data from the convolutional neural network layer over time steps of recursive neural network layer (Catanzaro, the output of the invariant convolution layer 110 to the recurrent or GRU bidirectional layers in fig. 1; time step l in convolution layer by equation 1 and further processed by the recursive neural network layers of equation 2, and wherein l is the time step). 
Claim 4: the combination of Kates and Catanzaro further teaches, according to claim 3 above, wherein feature data output by the convolutional neural network laver (Catanzaro, the row convolutional layer as an output layer of the recurrent layer in fig. 7) is weighted by the attention layer for each of the time steps (Catanzaro, one or more fully connected layers 120 as the claimed attention layer are applied to the previous layer L-1 with 
    PNG
    media_image1.png
    26
    129
    media_image1.png
    Greyscale
, para 60, i.e., an attention layer added to the decoder, para 44; the row convolution layer as the output layer of the recurrent layer in fig. 7, and the discussion in claim 1 above), and wherein the data processed by the recursive neural network layer is the weighted feature data (Catanzaro, fig. 7, recursive neural network layer represented by equation 3, where Wl is the weight to the previous feature of the hidden layer hl-1t and U is the recurrent weight matrix, para 59 and further weighted by the fully-connected layer in fig. 1).
Claim 5: the combination of Kates and Catanzaro further teaches, according to claim 1 above, the method further comprising:

Claim 6: the combination of Kates and Catanzaro further teaches, according to claim 5 above, the method further comprising:
converting the audio recording into the sound recording data of the physical environment (Kates, audio recordings from the microphones 12, 16 in fig. 1 and Catanzaro, a spectrogram of power normalized audio clips is used as the features to the system, para 54, and thus, converting from the audio clips to the spectrogram is inherency for the features to the system).
Claim 7: the combination of Kates and Catanzaro further teaches, according to claim 6 above, wherein the audio recording is in a non-visual image format (Kates, the audio signal recorded by the microphones 12, 16) and the sound recording data is a visual image format (the spectrogram, para 54, and spectrogram is inherently visual image format with frequency and time, https://en.wikipedia.org/wiki/Spectrogram ).
Claim 8: the combination of Kates and Catanzaro further teaches, according to claim 1 above, wherein the convolutional neural network layer, the attention layer, and the recursive neural network layer trained to generate the acoustic classification using backpropagation (Kates, hidden layer consisting of 16 neurons, output layers, being trained by resilient back propagation algorithm, para 73, 75 and Catanzaro, training the deep recurrent networks by back propagation, para 144 and wherein the deep recurrent networks inherently include the 
Claim 9: the combination of Kates and Catanzaro further teaches, according to claim 8 above, wherein the acoustic classification neural network is trained on training audio from one or more different environments including at least an outdoor environment and an indoor environment (Kates, restaurant clatter, i.e., indoor environment, and traffic noise scene, i.e., outdoor, para 19 and Catanzaro, including the environment such as a café, a bus, i.e., indoor, and a street, a pedestrian area, i.e., outdoor, para 170).
Claim 10: the combination of Kates and Catanzaro further teaches, according to claim 1 above, wherein the acoustic classification neural network further comprises a classification layer that generates the acoustic classification (Kates, neural network with fully-passed features in fig. 6 to the next layer as indicated in fig. 5, and Catanzaro, CTC as Connectionist Temporal Classification for predict speech transcriptions from audio, i.e., classification, para 37,  CTC-RNN model for speech recognition, para 45 and the implementation in figs. 10-11; the CTC layer includes a softmax layer, para 53).
Claim 11: the combination of Kates and Catanzaro further teaches, according to claim 10 above, wherein the classification layer outputs a numerical value for each a plurality of scene categories, the numerical value indicating a likehhood that the acoustic classification is of a given scene category from the plurality of scene categories (Kates, environment classifier 32 with numbers describing classification accuracies in tables 2-4 in figs. 8-10, para 88, and Catanzaro, re-score the outputs of the deep neural network, para 44).
Claim 12: the combination of Kates and Catanzaro further teaches, according to claim 11 above, wherein the plurality of scene categories include one or more of a group comprising: a bus, a cafe, a car, a city center, a forest, a grocery store, a home, a lakeside beach, a library, a
 railway station, an office, a residential area, a train, a train, and an urban park (Kates, restaurant, traffic noise, para 19, and Catanzaro, including the environment such as a café, a bus, a street, a pedestrian area, etc., para 170).
Claim 13: the combination of Kates and Catanzaro further teaches, according to claim 1 above, wherein the sound recording data is a spectrogram comprising frequency on a vertical axis and time on a horizontal axis (Catanzaro, speech spectrograms 105 as input samples to the neural network in fig. 1 and time series of spectrogram frames x(t), para 55, time on a horizontal axis and frequency on a vertical axis are inherency for the spectrogram, https://en.wikipedia.org/wiki/Spectrogram).
Claim 15 has been analyzed and rejected according to claims 14, 2 above.
Claim 16 has been analyzed and rejected according to claims 14, 3 above.
Claim 17 has been analyzed and rejected according to claims 16, 4 above.
Claim 18 has been analyzed and rejected according to claims 14, 5 above.
Claim 20 has been analyzed and rejected according to claims 19, 2 above.

The prior art (“Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification” by Peng Zhou, etc., “Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 7-12, 2016, pages 207-212) made of record and not relied upon is considered pertinent to applicant's disclosure because 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654