DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the applicants’ RCE application filed on November 29, 2021 and wherein the Applicant has amended claims 1-3, 6-8, 12, 15-17, cancelled claims 4, 9, 13-14.
In virtue of this communication, claims 1-3, 5-8, 10-12, 15-17 are currently pending in this Office Action.
With respect to the objection of drawings due to formality issue about the features as recited in claim 9, as set forth in the previous Office Action, the Applicant’s claim amendment including the cancelation of claim 9, and argument, see paragraph 4 of page 7 in Remarks filed on November 29, 2021 have been fully considered and the argument is persuasive. Therefore, the objection of drawings due to the formality issue about the features as recited in claim 9, as set forth in the previous Office Action, has been withdrawn.
With respect to the rejection of claim 9 under 35 USC §112(a), as set forth in the previous Office Action, the Applicant’s claim amendment including the cancelation of claim 9, and therefore, the rejection of claim 9 under 35 USC § 112(a), as set forth in the previous Office Action, has been withdrawn.
With respect to the rejection of claim 9 under 35 USC §112(b), as set forth in the previous Office Action, the Applicant’s claim amendment including the cancelation of claim 9, and therefore, the rejection of claim 9 under 35 USC § 112(b), as set forth in the previous Office Action, has been withdrawn.


Foreign Priority
The text of those sections of Title Foreign Priority not included in this action can be found in a prior Office Action mailed on January 25, 2021.

Claim Objections
Claims 1-3, 5-8, 10-12, 15-17 are objected to because of the following informalities: 
Claim 1 recites “one or more of the identified points of inflection as landmarks, each of the landmarks …” which should be -- one or more of the identified points of inflection as the landmarks, each of the landmarks …--  if “landmarks” herein is referred back to “a set of landmarks in a section of the audio stream” as recited in claim 1. Claims 2-3, 5-8, 10-12, 17 are objected due to the dependencies to claim 1.
Claim 15 is objected for the at least similar reason as described in claim 1 above since claim 15 recited similar deficient feature as recited in claim 1.
Claim 16 is objected for the at least similar reason as described in claim 1 above since claim 16 recited similar deficient feature as recited in claim 1.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (US 20020083060 A1, hereinafter Wang) and in view of reference Jaap et al (“A Highly Robust Audio Fingerprinting System”, http://ismir2002.ircam.fr/proceedings/02-FP04-2.pdf, p.1-9, 2002, hereinafter Jaap).
Claim 1: Wang teaches a method of answering machine (title and abstract, ln 1-8 and method steps in fig. 1 and a system in fig. 2 and detecting specific audio signal such as advertisement, music, i.e., acoustic message stored in an answering machine, para [0079]), comprising:
receiving an audio stream (receiving power norm samples of captured audio samples at step 12, e.g., 64 samples in fig. 5, the audio samples including music, radio broadcast programs, and advertisements, para [0037]; or L4 norm format of the audio samples in a time-domain in fig. 5; or spectrum samples in time-frequency domain in figs. 7A-7C);
identifying a set of landmarks in a section of the audio stream (fig. 5, e.g., 10 second samples, para [0038]; at step 14, calculating landmarks for media sample in a size, para [0039]);

comparing the derived audio fingerprint (the derived fingerprints above), with any one of a plurality of stored audio fingerprints (retrieved sets of fingerprints stored in a database by index 18 at step for comparing to the computed fingerprints in fig. 1, para [0040]); and
determining that the received audio stream is a recorded message of an answering machine (stored in an answering machine, para [0079]) if the derived audio fingerprint is substantially equivalent to one of the plurality of stored audio fingerprints (file identifier with highest score or largest number of linearly related correspondences is located in the database containing a large number of known media files, para [0037]; further returned and having the same fingerprints, para 40),
wherein identifying the set of landmarks in the section of the audio stream includes identifying points of inflection in the audio stream (Power Norm, e.g., L4 norm with dashed lines in fig. 5) and using one or more of the identified points of inflection as landmarks (the dashed line at local maxima is chosen as landmarks, para [0052]-[0053]), each of the landmarks is a single sample value of the audio stream (e.g., peaks and valley in fig. 5), each of the landmarks is one of a peak sample in the audio stream or a trough sample in the audio stream (e.g., peaks dashed lines marked as peaks in fig. 5).
However, Wang does not explicitly teach wherein comparing the derived audio fingerprint with any one of the plurality of stored audio fingerprints includes aligning a first origin landmark of the derived audio fingerprint with a second origin landmark of the stored audio fingerprint.
Jaap teaches an analogous field of endeavor by disclosing a method of audio fingerprint detection (title and abstract, ln 1-19 and fig. 1) and the method comprising:
receiving an audio stream (collection of sub-fingerprint as single frame, e.g., F(n, 0), F(n, 1) …, F(n, 31) etc. in fig. 1, Section 4.2 Extraction Algorithm, p.4, col 1);
identifying a set of landmarks in a section of the audio stream (identifying bit 0 as a black pixel while bit 1 as a white pixel in fig. 2, 4.2 Extraction Algorithm, p.4, col 2);
deriving an audio fingerprint for the section of the audio stream from the set of landmarks identified (set of combination of bit 0 and bit 1 constitutes 32-bit data in a time frames in fig. 2, 4.2 Extraction Algorithm, p.4, col 2);
comparing the derived audio fingerprint with any one of a plurality of stored audio fingerprint (finding unknown extracted fingerprints in fingerprint database by block comparison, section 5.1 Search Algorithm, p.6, col 1) and wherein comparing the derived audio fingerprint with any one of the plurality of stored audio fingerprints includes aligning a first origin landmark of the derived audio fingerprint with a second origin landmark of the stored audio fingerprint (aligning the extracted fingerprint block with the lookup table LUT of the database in fig. 6, indicated by sorrows from “Fingerprint Block” to the LUT and then to each of the sub-fingerprint of the song 1, song 2, …, song N in fig. 6, and for finding optimal positions in the database and having an error free or exact match, e.g., 17 out of the 256 sub-fingerprints are error-free, section 5.1 Search Algorithm, p.6, col 2) for benefits of achieving an improvement of audio recognition by highly robust fingerprint extraction and very efficient fingerprint search strategy (abstract, section 1. Introduction, col 2, items 1, 2, and 3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the comparing the derived audio fingerprint with any one of a plurality of stored audio fingerprints and wherein comparing the derived audio fingerprint with any one of the plurality of stored audio fingerprints includes aligning the first origin landmark of the derived audio fingerprint with the second origin landmark of the stored audio fingerprint, as taught by Jaap, to comparing the derived audio fingerprint with any one of a plurality of stored audio fingerprints, as taught by Wang, for the benefits discussed above.
Claim 15 has been analyzed and rejected according to claim 1 above and the combination of Wang and Jaap further teaches a non-transitory computer-readable medium storing one or more processor-executable instructions (Wang, RAM and ROM with instructions and p.12, para 110), which executed by at least one processor cause the at least one processor to perform the operations of claim 1 (Wang, CPUs executing the software programs and p.4, para 42).
Claim 16 has been analyzed and rejected according to claims 1 and 15 above and the combination of Wang and Jaap further teaches a system comprising: a memory (Wang, computer memory such as RAM and ROM and p.10, para 110); and at least one processor coupled to the memory (Wang, CPUs to execute the software program, e.g., IntelTM-based personal computer or other workstation, p.4, para 42 and thus, the processors coupled to the memory is inherency for the IntelTM-based personal computer for retrieving and executing the software programs stored in the memory).
Claim 2: the combination of Wang and Jaap further teaches, according to claim 1 above, wherein deriving the audio fingerprint comprises identifying the relative locations of the set of landmarks (Wang, landmarks and fingerprints pairs and the landmarks occurring at particular locations that has been determined, e.g., in fig. 4, para [0039]).
Claim 3: the combination of Wang and Jaap further teaches, according to claim 2 above, wherein deriving the audio fingerprint further comprises identifying a value relating to each of the landmarks (Wang, e.g., each of the locations of the landmarks in fig. 4 and having time offset values with respect to the start of the segment, para [0049]).
Claim 5: the combination of Wang and Jaap further teaches, according to claim 1 above, wherein identifying the points of inflection in the audio stream comprises stepping through sample values making up the audio stream (Wang, considering the audio signal as a L4 norm format as the received audio stream in fig. 5; general spectral Lp Norm is calculated at each time along the sound signal, and Lp norm for each time slice, e.g., multiple 64 time samples, as sum of the pth power of the absolute values of the spectral components, para [0052]), comparing an amplitude of each sample value with an amplitude of at least an immediately adjacent sample value, and determining whether each sample value comprises a point of inflection (Wang, a peak or landmark chosen as the local maxima of the resulting values over time, i.e., also point of inflection, para [0052]).


Claims 6-7, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (above) and in view of references Jaap (above) and Vlack et al (US 20130259211 A1, USPGPub of the US 8681950 B2, IDS submitted on September 9, 2019).
Claim 6: the combination of Wang and Jaap further teaches, according to claim 5 above, determining a difference in amplitude between a point of inflection in the audio stream (Wang, considering the audio signal as a L4 norm format as the received audio stream in fig. 5) and an immediately adjacent sample value (Wang, maxima at dashed line in time line in fig. 5, i.e., comparing the energy of neighboring sample point is inherency for finding the maxima in fig. 5), except explicitly teaching wherein a threshold and wherein identifying one or more persistent landmarks, any one of the persistent landmarks being identified by determining whether the disclosed difference in the amplitude between a point of inflection in the audio stream and an immediately adjacent sample value exceeds the threshold, and when the threshold is exceeded, identifying that point of inflection as a persistent landmark.
Vlack teaches an analogous field of endeavor by disclosing a method for recorded message detection (title and abstract, ln 1-10 and method steps in fig. 2 and for Answering Machine Detection AMD, para [0005]) and wherein a threshold is disclosed (e.g., zero used in equation F(n, m), para [0056]-[0057]) and wherein determining whether a difference in amplitude between a point of inflection in the audio stream and an immediately adjacent sample value (energy difference value between the neighboring frequency bands and neighboring signal frame in the spectrogram domain in fig. 3B and difference {E(n, m) - E(n, m+1) – E(n-1, m) – E(n-1, m+1), para [0056]-[0057]) exceeds the threshold is performed (N-bit binary fingerprint frame values F(n, m) is either 1 or 0 obtained by calculating the difference of the energy and comparing to value zero), and when the threshold is exceeded, identifying that point of inflection as any one of persistent landmarks and identifying one or more persistent landmarks (N-bit binary fingerprint frame values F(n, m) = 1 in audio signal spectrogram domain in fig. 4 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the threshold and wherein identifying the one or more persistent landmarks, any one of the persistent landmarks being identified by determining whether the disclosed difference in the amplitude between the point of inflection in the audio stream and the immediately adjacent sample value exceeds the threshold, and when the threshold is exceeded, identifying that point of inflection as a persistent landmark, as taught by Vlack, to the difference of sample energy of the time domain in the method for recorded message detection, as taught by the combination of Wang and Jaap, for the benefits discussed above.
Claim 7 has been analyzed and rejected according to claims 5-6 above, and the combination of Wang, Jaap, and Vlack further teaches, wherein identifying sample values within the persistent landmarks that have absolute amplitude values greater than a sample value of a neighbouring landmark (Vlack, N-bit binary fingerprint frame values F(n, m) = 1 in audio signal spectrogram domain in fig. 4 and used for locating and identifying the audio sample of the call audio signals and p.4, para 59) and designating the identified sample values as origin landmarks (Vlack, naming of the N-bit binary fingerprint frame having values F(n, m) = 1).
Claim 17 has been analyzed and rejected according to claims 5, 7 above.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Wang (above) and in view of references Jaap (above) and Richards (US 20030191764 A1).
Claim 10: the combination of Wang and Jaap teaches all the elements of claim 10, according to claim 1 above, including adding the derived audio fingerprint to a database (new fingerprints is added to the front of the second buffer, para [0108]), except adding the derived audio fingerprint to a database when there is no match with a stored audio fingerprint.
Richards teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-12 and fig. 1) and wherein adding a derived audio fingerprint (at step 106 in fig. 1; audio file, abstract) to a database (insert file fingerprint in database at step 112 in fig. 1) when there is no match with a stored audio fingerprint is disclosed (the insertion above is performed while no match is found at step 110 in fig. 1) for benefits of achieving a more accurate recognition by using audio fingerprint in complex conditions such as noises, psychoacoustic compression artifacts, small amounts of time compression and expansion, envelop changes, etc. (para [0009]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied adding the derived audio fingerprint to a database when there is no match with a stored audio fingerprint, as taught by Richards, to adding the derived audio fingerprint to a database in the method, as taught by the combination of Wang and Jaap, for the benefits discussed above.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Wang (above) and in view of references Jaap (above), Richards (above), and Burges et al (US 20060106867 A1, Burges).
Claim 11: the combination of Wang, Jaap, and Richards teaches all the elements of claim 11, according to claim 10 above, except deleting the derived audio fingerprint from the database when the drived audio fingerprint is not accessed after a predetermined period of time since the derived audio fingerprint was added to the database.
Burges teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-17 and fig. 3) and wherein a fingerprint database is disclosed (combination of trace cache and fingerprint database, 240, 250 in fig. 3) and a derived audio fingerprint is disclosed (derived from the fingerprint data to the trace cache in fig. 3, and trace as fingerprint in the trace cache, para [0047]) wherein deleting the derived audio fingerprint from the database when the derived audio fingerprint is not accessed after a predetermined period of time (removing trace entries whose lifetime has expired, para [0093]) since the derived audio fingerprint was added to the database (expired with respect to the cache 240, para [0093]) for benefits of achieving an improvement of a performance in fingerprint matching operation by providing trace or fingerprint cache to accelerating the data access speed and allow large volume of concurrently accessing the fingerprint database at a real-time scheme (para [0007]-[0009]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein deleting the derived audio fingerprint from the database when the derived audio fingerprint is not accessed after a predetermined period of time since the derived audio fingerprint was added to the database, as .

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Wang (above) and in view of references, Jaap (above), Burges et al (US 20060106867 A1, Burges).
Claim 12: the combination of Wang and Jaap teaches all the elements of claim 12, according to claim 1 above, except deleting any one of the plurality of stored fingerprints from the database when the stored audio fingerprint is not matched after a predetermined period of time.
Burges teaches an analogous field of endeavor by disclosing a method (title and abstract, ln 1-17 and fig. 3) and wherein a fingerprint database is disclosed (combination of trace cache and fingerprint database, 240, 250 in fig. 3) and a derived audio fingerprint is disclosed (derived from the fingerprint data to the trace cache in fig. 3, and trace as fingerprint in the trace cache, para [0047]) and wherein deleting any one of the plurality of stored fingerprints from the database when the stored audio fingerprint is not matched after a predetermined period of time (simply remove traces from the cache after they fail to match any incoming traces for some short period of time, on the order of about one second, para [0024]) for benefits of achieving an improvement of a performance in fingerprint matching operation by providing trace or fingerprint cache to accelerating the data access speed and allow large volume of concurrently accessing the fingerprint database at a real-time scheme (para [0007]-[0009]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein deleting any of the plurality of 

Allowable Subject Matter
Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if claim amendment is performed only to overcome claim objection as set forth above.

Response to Arguments

Applicant's arguments filed on November 29, 2021 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. The Examiner has thoroughly reviewed Applicants' arguments but firmly believes that the cited references to reasonably and properly meet the claimed limitations.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/LESHUI ZHANG/
Primary Examiner, Art Unit 2654