DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Specification
Specification fails to disclose what E24, E25 are, but shown in fig. 2., which is confusing what E24, E25 are in fig. 2.
Appropriate correction is required.

Drawing
Drawing fig. 2 indicates a group of processing with labels “E24”, “E25”, but no further description of what E24, E25 are. 
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) ELEMENT IN CLAIM FOR A COMBINATION.—An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Because these claim limitations are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) limitation:  
a classification device in claim 7 – module CLASS in fig. 1 and receiving signal and analyzing the signal to classify sound in the signal, para [0079]-0080] (USPGPub 20210098005 A1, hereinafter), as an application software and implemented by a processor PROC in figs. 3-5, para [0105];
an identification device in claim 7 – equivalent to an interpretation module in fig. 1, para [0087]-[0091], as an application software and implemented by a processor PROC in figs. 3-5, para [0105];
an enrichment device in claim 8 – equivalent to an enrichment module ENRCH in fig. 1, para [0096], as an application software and implemented by a processor PROC in figs. 3-5, para [0105].

If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f) applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f) or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f).
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Objections
Claims 1-8 are objected to because of the following informalities: 
Claim 1 recites “a non-transitory computer-readable medium comprising instructions stored thereon which when executed by the processor configure the identification device to:” which should be -- a non-transitory computer-readable medium comprising instructions stored thereon which when executed by the processor Claim 1 further recites “identify said scene from the at least two sounds captured …” which should be -- identify said scene from Claims 2-6 are objected due to the dependencies to claim 1.
Claim 2 further recites “said identifying device identifies the scene among …” which should follow proposed amendment in parent claim 1 and recommended to-- said processor Claim 2 further recites “for identifying a scene according to claim 1” which should be -- for identifying [[a]]the  scene according to claim 1--. Claim 2 further recites “…scene being arranged in chronological order” which should be --…scene being arranged in the chronological order--.
Claim 3 further objected for the at least similar reason as described in claim 2 above because claim 3 recites the similar deficient feature as recited in claim 2. For example, claim 3 recites “for identifying a scene according to claim 1” and “the instructions configure the identification device to received …”. Claim 4 objected due to the dependency to claim 3.
Claim 4 further objected for the at least similar reason as described in claim 3 above because claim 4 recites the similar deficient feature as recited in claim 3. For example, Claim 4 further recites “The identification device for identifying a scene according to …” and “the instructions configured the identification device to …”.
Claim 5 further objected for the at least similar reason as described in claim 3 above because claim 5 recites the similar deficient feature as recited in claim 3. For example, claim 5 recites “for identifying a scene according to
Claim 6 further objected for the at least similar reason as described in claim 3 above because claim 6 recites the similar deficient feature as recited in claim 3. For example, claim 6 recites “for identifying a scene according to claim 1” and “the instructions configure the identification device to transmit …”.
Claim 7 recites “wherein said system comprises:”, “each sound received” which should be --wherein said identification system comprises:--, --each of the sounds received--. Claim 7 further recites “captured by the capture devices” which should be captured by capture devices because the “capture devices” is first referred at this point. Claim 8 is objected due to the dependency to claim 7.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 4, 10 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 4 recites “the captured sounds being associated with several possible sound classes” which is confusing because it is unclear whether “several sound classes” or no “several sound classes” (possible sound classes include no sound classes) “being associated with”, and thus, renders claim indefinite. Note: the word “possible” as to “sound classes” is uncertain and causes indefinite of “sound classes” or no “sound classes” herein.
Claim 10 recites “updating, at least one database, using at least one part of the …” which is confusing because it is unclear whether “at least one database” is updated by “using at least one part of the …” or “using at least one part of the …” is updated at “at least one database” and thus, renders claim indefinite.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention..

Claims 1, 5, 9, 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bansal et al (US 20160077574 A1, hereinafter Bansal).
Claim 1: Bansal teaches An identification device (title and abstract, ln 1-13, a device 100 or part of device 100 in fig. 1) for identifying a scene (e.g., an acoustic event or scene for wakeup, para [0010]) in an environment (the environment in fig. 1, para [0010]), said environment comprising at least one sound capture means device (e.g., microphone 108 in fig. 1), said identification device is comprising:
a processor (a processor, para [0076]; e.g., element 404 in fig. 4, 502 in fig. 5); and
a non-transitory computer-readable medium (a storage medium, para [0076]) comprising instructions stored thereon (instructions and computer programs stored thereon, para [0076]) which when executed by the processor configure the identification device to:

Claim 9 has been analyzed and rejected according to claim 1 above and Bansal further teaches each of said at least two sounds being associated respectively with at least one sound class (both w1 and w2 as wakeup candidates in fig. 2b are the classes of the wakeups, respectively, which are close each other in acoustic meaning and in a time period, para [0051]-[0056]).
Claim 11 has been analyzed and rejected according to claims 1 and 9 above.
Claim 5: Bansal further teaches, according to claim 1 above, wherein the instructions configure the identification device to trigger at least one action to be performed following the identification of said scene (e.g., perform a task after wakeup recognized such as making a call, para [0003]; e.g. after “Hellow Dragon” is recognized as wakeup, “call home” is followed, para [0033]).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 2-3 are rejected under 35 U.S.C. 103 as being unpatentable over Bansal (above) and in view of reference Clarkson et al (“Auditory Context Awareness via Weable Computing”, Proceedings of 1998 Workshop on Perceptual User Interfaces, January 1, 1998, XP 055677044, pages 1-6, hereinafter Clarkson; IDS submitted on September 25, 2020).
Claim 2: Bansal teaches all the elements of claim 2, according to claim 1 above, including wherein said identifying device identifies the scene is identified among a group of predefined scenes (Bansal, a group including a wakeup audio event and non-wakeup audio event corresponding to a False Accept, para [0005]), except wherein each predefined scene being associated with a predetermined number of marker sounds, said marker sounds of a predefined scene being arranged in chronological order.
Clarkson teaches an analogous field of endeavor by disclosing an identification device for identifying a scene in an environment (title and abstract, ln 1-11 and fig. a device called as Normadic Radio including a microphone to collect sound data, Nomadic Radio being a wearable computing platform, section Application on page 3) and wherein identifying a scene (including one of biking, supermarket, home, in fig. 4, abstract) from at least two sounds captured in said environment (more than one sound object presented in an event, paragraph 3 of col 2, p.3, Sound Object Classification; e.g., sound objects include speech, telephone rign, passing car, etc., section Introduction, p.1) by an at least one sound capture device (a wireless lavalier M 185, section 3.1 Auditory Input, col 2, page 2) is disclosed, each of said at least two sounds being associated respectively with at least one sound class (e.g., variety of speech and non-speech sounds, section 3.2 Feature Extraction, col 2, page 2; e.g., active sound objects and inactive sound objects in fig. 5), said scene being identified by taking account of a chronological order (figs. 3, 4) in which said at least two sounds were captured (sound objects in figs. 3, 4; more than on sound object in an event, para [0003] of col 2 at page 3) and wherein the identifying device is configured to identify the scene among a group of predefined scenes (such as office, supermarket, busy street, etc., abstract; e.g., shown in fig. 4), each predefined scene being associated with a predetermined number of marker sounds (labeling sound objects in fig. 3; active sound marked with solid circles in fig. 5), said marker sounds of a predefined scene being arranged in chronological order (sound saves at the first row of fig. 4, mapped to different scenes and scene transitions in fig. 4) for benefits of achieving an improvement in operation performance of acoustic scene awareness by achieving a real-time practice and application (the last paragraph at col 2, page 1). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein each predefined scene being associated with the predetermined number of marker sounds, said marker sounds of the predefined scene being arranged in chronological order and other features, as taught by Clarkson, to the identifying of the scene among the group of predefined scenes in the identification device, as taught by Bansal, for the benefits discussed above.
Claim 3: the combination of Bansal and Clarkson further teaches, according to claim 1 above, wherein the instructions configure the identification device to receive at least one piece of complementary data provided by a connected device from said environment (Bansal, a GUI provided for the device in fig. 5, para [0075], and Clarkson, hand labeling some scene transitions in fig. 4, para [0002] of col 1 at page 5, and thus, inherently a connected device used for hand labeling above) and associate a label with the sound class of at least one of the captured sounds or with said identified scene (Clarkson, hand labeling with the identified scene transitions and sound waves in fig. 4).

Claims 4, 6-8, 10 are rejected under 35 U.S.C. 103 as being unpatentable over Bansal (above) and in view of references Clarkson (above) and Stephanson (US 20090180628 A1).
Claim 4: the combination of Bansal and Clarkson teaches all the elements of claim 4, according to claim 3 above, including wherein the instructions configure the identification device to, in response to at least one of the captured sounds being associated with several possible sound classes (Bansal, several possible sounds, wakeup sound and non-wake sound, para [0005], and Clarkson, speech, telephone ring, passing car, etc., section Introduction, page 1), determine a sound class of the several possible sound classes for the at least one of said captured sound (Banasal, using the acoustic model for scoring, para [0034], and Clarkson, using trained HMM to discriminate the speech and sounds, section 3.2 Feature Extraction), except  it is using said at least one piece of complementary data received to perform the disclosed determination of the sound class of the several possible sound classes for the at least one of said captured sound.
Stephanson teaches an analogous field of endeavor by disclosing an identification device (title and abstract, ln 1-11 and a system in a system in fig. 4) and wherein in response to at least one of the captured sounds (a microphone used for performing signal acquisition at the device 120, para [0028]) being associated with several possible sound classes (signal database 424 storing predefined acoustic events, para [0040]; a characteristic frequency spectrum such as an oscillation at a specific frequency band of the monitored audio signal as an event, para [0020]), determine a sound class of the several possible sound classes for the at least one captured sound (captured by the microphone or acquisition device 220 in fig. 6, para [0028], the predefined events or predefined frequency spectrum oscillation stored in a database and compared with the processed event for classifying the event, para [0040]) using at least one piece of complementary data received (other inputs 440, by using different acquisition device, para [0042]; combining into multiple events for further processing, para [0042], including classification implemented by the event determination 425 with the processing unit 421 and the database 424 in fig. 4, para [0040]) for benefits of achieving an improvement of performance by increasing accuracy assessment and operation efficiency (para [0003], para [0022], para [0105]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein it is using the at least one piece of complementary data received to perform the determination of the sound class of the several possible sound classes for the at least one of said captured sound, as taught by Stephanson, to the determination of the sound class of the several possible sound classes in the identification device, as taught by the combination of Bansal and Clarkson, for the benefits discussed above.
Claim 6: the combination of Bansal and Clarkson teaches all the elements of claim 6, according to claim 1 above, including a piece of information indicating the scene identified (Bansal, an output of element 324 in fig. 3d for the following task of adaptation with specific speaker, para [0002], by using adapted model 152 in fig. 1a, para [0035], and Clarkson, identified scene event such as biking, supermarket, home, etc., in fig. 4), and at least two sound classes (Bansal, the wakeup sound represented by the characteristics w1 and w2 in fig. 2b and Clarkson, one or more sound objects in the event and figs. 3, 4, the discussion in claim 2 above) and a chronological order associated with the identified scene (Clarkson, fig. 4), and a chronological order associated with the identified scene (Bansal, the markers t1 and t2 with the wakeup w1, w2, in fig. 2b and Clarkson, chronological order with identified scenes in fig. 4), and at least one part of the-audio files corresponding to the captured sounds associated respectively with a sound class (Bansal, audio files from target users and corresponding transcriptions, para [0002]), at least one sound class associated with a label (Bansal, w1, w2 in a time line marked as t1 and t2, respectively, in fig. 2b and Clarkson, figs. 3, 4), except  wherein the instructions configure the identification device to transmit the disclosed at least one part of the data to an enrichment device at least one part of the following data.
Stephanson teaches an analogous field of endeavor by disclosing an identification device (title and abstract, ln 1-11 and a system in a system in fig. 4) and wherein transmitting to an enrichment device (a remote second event processing system, para [0064], e.g., refinement of event determination processing in fig. 5, para [0064]-[0065]) a piece of information indicating the scene identified (identified the conditioned data outputted from the processing unit 421 in fig. 4) for the benefits as discussed in claim 4 above.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the transmission of data to the enrichment device, as taught by Stephanson, to the the piece of information indicating the scene identified, and at least two sound classes and a chronological order associated with the identified scene, at least one part of the audio files corresponding to the captured sounds associated respectively with the sound class, and at least one sound class associated with the label used in the identification device, as taught by the combination of Bansal and Clarkson, for the benefits discussed above.
Claim 7 has been analyzed and rejected according to claims 1 above, and the combination of Bansal, Clarkson, and Stephanson further teaches, 
a classification device (Stephanson, a system in fig. 4, including MEMS acquisition device 220, e.g., array of MEMS devices, para [0107]) configured to receive sounds captured by the capture devices in said environment (Bansal, the microphone 108 in fig. 1 and Clarkson, the wireless microphone, and the discussion in claim 2 above, and Stephanson, output from the element 220 in fig. 4), and determine, for each sound received, at least one sound class (Bansal, w1 and w2 in fig. 2b and the discussion in claim 1 above and Clarkson, one or more sound objects in the event and the discussion in claim 2 above, and Stephanson, classified to an predetermined event or specific spectrum oscillation, para [0040]);
an identification device (Stephanson, refined condition system 500 in fig. 5) configured to identify said scene from at least two of the captured sounds by taking account of a chronological order in which the at least two captured sounds were captured (Bansal, identified wakeup w1 and w2 at marker t1 and t2 in fig. 2b, and the discussion in claim 1 above, and 
Claim 8 has been analyzed and rejected according to claims 7 and 6 above and the combination of Bansal, Clarkson, and Stephanson further teaches wherein enrichment device is configured to update at least one database with at least one par of the data transmitted by the identification device (Stephanson, the trained data 514 is updated according to the linear discriminate analysis 513 in fig. 5, e.g., by download from a remote database, including new event information, para [0062]-[0063]).
Claim 10 has been analyzed and rejected according to claims 9 and 8 above

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30am-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 

/LESHUI ZHANG/
Primary Examiner, Art Unit 2654