Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1, 6-12, 17, and 23-24 are pending. Claims 1, 12, and 23-24 are independent.  Claims 1, 8, and 12 are amended and Claims 4-5 and 15-16 are canceled by the most recent amendments.
This Application was published as U.S. 2021/0193167.
Apparent foreign priority October 2017.  
Note that the terminology of the instant Application (a translation from Chinese) deviates from the ordinary meaning of the term.  See Response to Arguments below.
Pending Claims are allowed.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 7/20/2022 has been entered.
Response to Amendments
Rejection of Claim 8 under 112(b) for antecedent basis issues is withdrawn in view of the amendments to this Claim.  (Note the 112b issues in Claims 1 and 12)

Response to Arguments
	Instant Claims as amended include training a classifier for cleaning up an audio fingerprint database from those audio fingerprints that are formed based on audio samples with certain types of acoustic features which are susceptible to audio attacks.  The Claim trains a classifier to identify acoustic features that are corrupted after an audio attack.  Then, the trained classifier searches the audio fingerprint database and identifies audio fingerprints that are based on audio samples that include the susceptible acoustic features.
1. An audio recognition method, comprising: 
acquiring an audio file to be recognized; 
extracting audio fingerprints of the audio file to be recognized; and 
searching audio attribute information matched with the audio fingerprints, in a fingerprint index database; 
wherein, the fingerprint index database comprises first audio fingerprints corresponding to audio samples; the first audio fingerprints are audio fingerprints in which invalid audio fingerprints have been removed from the audio samples by a classifier; and each of the audio samples has corresponding audio attribute information;
wherein the classifier is established through following operations: 
extracting feature point data of audio data in a training data set as first feature point data; 
performing an audio attack on the audio data in the training data set, and extracting feature point data of audio data in the training data set after performing the audio attack as second feature point data; 
comparing the first feature point data with the second feature point data, marking disappeared or moved feature point data as counter-example data, and marking feature point data with robustness as positive example data; and 
training and establishing the classifier by using the first feature point data, the positive example data, and the counter-example data;
wherein the classifier filters the audio samples, and removes feature point data, determined as counter-example data, as invalid audio fingerprints; 
wherein an operation that the classifier filters the audio samples, and removes the feature point data, determined as the counter-example data, as the invalid audio fingerprints, comprises: 
extracting feature point data of the audio samples; 
inputting the extracted feature point data into the classifier; and 
removing the feature point data, determined as the counter-example data, as the invalid audio fingerprints, according to a result of positive example data or counter-example data output by the classifier. 

12. An audio recognition device, comprising: 
one or more processors; 
a memory; and 
one or more application programs, wherein the one or more application programs are stored in the memory and configures to be executed by the one or more processors to: 
acquire an audio file to be recognized; 
extract audio fingerprints of the audio file to be recognized; and 
search audio attribute information matched with the audio fingerprints, in a fingerprint index database; 
wherein, the fingerprint index database comprises first audio fingerprints corresponding to audio samples; the first audio fingerprints are audio fingerprints in which invalid audio fingerprints have been removed from the audio samples by a classifier; and each of the audio samples has corresponding audio attribute information; 
wherein the classifier is established through following operations: 
extracting feature point data of audio data in a training data set as first feature point data; 
performing an audio attack on the audio data in the training data set, and extracting feature point data of audio data in the training data set after performing the audio attack as second feature point data; 
comparing the first feature point data with the second feature point data, marking disappeared or moved feature point data as counter-example data, and marking feature point data with robustness as positive example data; and 
training and establishing the classifier by using the first feature point data, the positive example data, and the counter-example data;
wherein the classifier filters the audio samples, and removes feature point data, determined as counter-example data, as invalid audio fingerprints; 
wherein an operation that the classifier filters the audio samples, and removes the feature point data, determined as the counter-example data, as the invalid audio fingerprints, comprises: 
extracting feature point data of the audio samples; 
inputting the extracted feature point data into the classifier; and 
removing the feature point data, determined as the counter-example data, as the invalid audio fingerprints, according to a result of positive example data or counter-example data output by the classifier. 

Terminology:
The phrases “positive example” and “counter-example” have a specific meaning in the context of training a model which is different from the meaning used in this instant Application.  “Positive example” and “counter-example” or “negative example” are usually referred to data that are used FOR training of a classifier model.  The positive examples tell the model what to look for and the negative examples tell the model what to avoid.  Thus, these examples are used During the training and For the training of the model.  They are not outputs of a trained model.  See, e.g., Brand (U.S. 6,112,021), Abstract, Sarikaya (U.S. 9519870) “16. A computerized system comprising: one or more processors; and a plurality of components that include computer-executable instructions that are executed by the one or more processors, the components including: a model building component that trains a classifier model for an entity class using positive sample entities that belong to the entity class and negative sample entities that do not belong to the entity class, ….”
In contrast, in the Claim limitation that follows, the trained classifier model is “Used” to remove the “counter-examples” / “negative examples”:  “removing the feature point data, determined as the counter-example data, as the invalid audio fingerprints, according to a result of positive example data or counter-example data output by the classifier.”  See also:  “[0111] After a classifier is established, the classifier is used to filter the above audio sample data, and the feature point data determined as counter-example data is removed as invalid audio fingerprints. At the same time, invalid audio fingerprints derived from the feature point data determined as the counter-example data can be removed….”  Published Application.  Thus, in the context of the instant Application, “positive example data” and “counter-example data” have a specific meaning as defined by the claimed language which is consistent with the Specification.

Additionally:
“Feature Point Data” = certain types of acoustic features.  See “[0101] … The first feature point data is feature data capable of reflecting audio attributes of the samples, such as dividing each audio file into multiple audio frames, then, the first feature point data can include at least one of the following: energy of an audio frame where the local maximum points are located, energy of a frequency where the local maximum points are located, and a ratio of energy of a frequency where the local maximum points are located to energy of the audio frame where the local maximum points are located, the number of the local maximum points in the audio frame, energy of an audio frame near the audio frame in time dimension, or energy distribution of points around the local maximum points.”
In cleaning the fingerprint database the fingerprints that are from audio samples with susceptible “feature point data” / “acoustic features” are removed:  “[0113] In one implementation mode, the classifier filters the audio sample data, and removes feature point data, determined as counter-example data, as invalid audio fingerprints, and the steps can include:”
Allowable Subject Matter
Pending Claims 1, 6-12, 17, and 23-24 are allowed.
The following is an examiner’s statement of reasons for allowance: In view of each of the particular limitations of the independent Claims when considered in the order established by the Claim language and in the context of the language of the independent Claims when each Claim is considered as a whole, the independent Claims of this Application were not found in the prior art that was viewed.
In particular, training a classifier to identify “feature point data” (certain types of acoustic features; see Claim 7 for examples of “feature point data” ) that are resistant and robust to audio attacks as well as those “feature point data” that are susceptible to audio attacks and then using the classifier, thus trained, to clean up an audio fingerprint database (“fingerprint index database”) by removing those “audio fingerprints” that are based on “audio samples” which include “feature point data” that are susceptible to audio attacks when considered in the context of the independent Claims and including all of the limitations of such Claims was not found in the prior art.  
This instant Application removes the suspect audio fingerprints by referring to the audio sample from which the fingerprint was derived and scrutinizing certain acoustic aspects of the underlying audio sample.  (Somewhat similar to the 2002 movie: Minority Report where “a specialized police department, apprehends criminals based on foreknowledge.”)
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Close Art of Record
Refer to the extensive list and discussion of references in the Final Rejection of 4/20/2022.
Audio Attacks and Classifier Development
Bauer (U.S. 20130279740):

    PNG
    media_image1.png
    205
    444
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    122
    430
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    444
    232
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    428
    325
    media_image4.png
    Greyscale

Bauer (U.S. 8700194):  “Audio fingerprints may be derived from an audio clip, sequence, segment, portion or the like, which is perceptually encoded. Thus the audio sequence may be accurately identified by comparison to its fingerprint, even after compression, decompression, transcoding and other changes to the content made with perceptually based audio codecs; even codecs that involve lossy compression, which may thus tend to degrade audio content quality (which may be practically imperceptible to detection). Moreover, audio fingerprints may function robustly over degraded signal quality of its corresponding content and a variety of attacks or situations such as off-speed playback.”  Col. 2, lines 33-43.

Removing Fingerprints from the Database:
Burges (U.S. 20050091275):  “[0008] The present invention relates to a system and method for detecting duplicate or corrupted audio files to facilitate management and removal of such files. Managing large audio collections is difficult since compared to images and text, for example, it is problematical to quickly parse large audio files. In the past, users have relied on labeling, which may be inaccurate. The present invention solves many of the drawbacks and shortcomings of conventional systems by providing tool for assisting the user in searching audio files, identifying files that may be duplicates of one another, identifying corrupted, noisy, or junk files, and facilitating removal of such files from a user's database. In one aspect, the user supplies two parameters to the system (the number of seconds (t) from the beginning of the audio in order to extract a fingerprint, and the size of a slop window (s)). The present invention then locates the user's audio files and computes a fingerprint based in part on (t) and (s). A user interface is provided to configure these and other parameters along with enabling users to remove duplicate or corrupted files that are automatically determined.”

General Fingerprinting:
Wang (U.S. 20170060882): “[0086] It should be noted that, the concept for the video/audio retrieval systems and methods can be extended to other services. For example, the disclosed video/audio retrieval methods and systems may be integrated on smart TV systems and/or smart terminals to help organize and share produced information valuable to assist in detecting and removing copyright infringing or perceptually identical video/audio content from the databases of such websites and prevent any future uploads made by users of these websites, to perform image and/or audio based identification or recognition, etc. Further, the video/audio fingerprinting may also be used for broadcast monitoring (e.g., advertisement monitoring, news monitoring) and general media monitoring. Broadcast monitoring solutions can inform content providers and content owners with play lists of when and where their video/audio content was used.”
Liu (U.S. 20160301972):  “[0109] The server device then stores the extracted audio fingerprints in the database 603. The audio fingerprints stored in the database 603 can be used to be compared with the audio fingerprint extracted from the audio signals of the TV advertisement watched by the user, as described above with respect to S604. In some embodiments, audio fingerprints associated with TV advertisements broadcast in a TV channel that are stored in the database 603 can be periodically updated. As a result, at any given time, audio fingerprints associated with the TV advertisement(s) that is currently broadcast or most recently broadcast via the TV channel are stored in the database 603, while audio fingerprints associated with outdated TV advertisements broadcast via the TV channel are removed from the database 603. For example, the database 603 can be configured to store only audio fingerprints of TV advertisements that have been broadcast via the group of TV channels in the last 10 minutes. For another example, the database 603 can be configured to store up to only ten most recent audio fingerprints of TV advertisements that have been broadcast via each TV channel from the group of TV channels. In such a method, the database 603 can store audio fingerprints of TV advertisements that are most recently broadcast without a need to constantly expand the storage of the database 603.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659