DETAILED ACTION
This is responsive to the amendment filed 15 December 2021.
Claims 1-9 remain pending and are considered below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 15 December 2021 have been fully considered but they are not persuasive.

Regarding the claim rejections under 35 USC § 112(b), Applicant argues that the amendments overcome the rejections. However, some of the rejections still stand (see below).

Regarding the rejection under 35 USC § 103, Applicant argues:
The phrase "multilingual text keywords and/or audio keywords" refers to multilingual text keywords and/or multilingual audio keywords throughout the claimed patent document including the present claims.
The Examiner respectfully disagrees. In the claims reciting the phrase, multilingual modifies text keywords and not audio keywords. If Applicant wishes to multilingual text keywords and/or multilingual audio keywords’.
Applicant also argues:
First, as explained above neither Ajmera nor Mantena teach a method to search multilingual text keywords in mixlingual speech corpus by converting multilingual text keywords to articulatory classes and subclasses. 
Second, neither Ajmera nor Mantena teach a method to combine text and audio keywords which is one of the modes of inputting keywords for search. As reported in [0028] of the present specification, the combined mode can have higher search performance in non-training languages. One implementation to combine text and audio keyword is given in the present specification (see [0068]). 
Third, neither Ajmera nor Mantena teach the apparatus to detect articulatory classes and subclasses by using training data from multiple languages (given in [0061],[0064] and figure 8 of the present specification). This apparatus of detecting  articulatory classes and subclasses improved the keyword search in audio data consisting of non-training languages.
However the claims do not require the descriptions above. In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., the three steps as argued above) are not recited in the rejected claims.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).


Mantena do not teach the method to convert multilingual text keywords into a common set of articulatory classes and subclasses. One implementation of this method is given in [0067] of the present specification. 
Ajmera also does not cover the search of multilingual text keywords in mixlingual speech. 
Also, neither Ajmera nor Mantena teaches combined mode wherein both the text and corresponding audio keywords can be entered simultaneously and their information can be combined. This mode gives enhanced performance for searching keywords from   non-training languages.
The Examiner respectfully disagrees. First, converting multilingual text keywords is claimed in the alternative and is not required wherein said keyword includes audio keywords. Mantena discloses inputting said keyword data, wherein said keyword includes plurality of audio keywords (“Let Q = {q1, q2,..., qi,..., qn} be a spoken query (or query) containing n feature vector”, section 3, paragraph 1). Second, “wherein both the text and corresponding audio keywords can be entered simultaneously and their information can be combined” is not claimed.

Regarding the rejection under 35 USC § 103 of claim 4, Applicant argues:
Mantena in view of Ajmera do not teach the method as claimed in claim 2, wherein before converting said input keyword to obtain plurality of articulatory information, pre-training said electronic device to detect articulatory classes and subclasses information associated with said input keyword.


Regarding the rejection under 35 USC § 103 of claim 7, Applicant argues:
Mantena has used only one language (Telugu) for training their model (Section 5, second paragraph). Neither Ajmera nor Mantena teaches an apparatus to detect articulatory classes and subclasses by using training data from multiple languages (given in [0061], [0064] and figure 8 of the present specification). Use of multiple languages for training results in more accurate detection of articulatory classes and subclasses of audio containing non-training languages.
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., an apparatus to detect articulatory classes and subclasses by using training data from multiple languages) are not recited in the rejected claims.  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

Applicant's remaining arguments either have been addressed above or fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5-7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites the limitation “wherein converting, by said electronic device, said multilingual text keywords into a sequence of phonemes” in lines 1-2. First, “converting, by said electronic device, said multilingual text keywords into a sequence of phonemes” lacks proper antecedent basis in the claim. Second, the wherein claim is not followed by a further limiting limitation. The metes and bounds of the claim are indefinite. According to Applicant’s arguments, it is believed the claim should read:
The method as claimed in claim 1, further comprising converting, by said electronic device, said multilingual text keywords of said keyword data into a sequence of phonemes, and 
Claim 6, in lines 7-8, recites the limitation “said information associated with articulatory classes and subclasses”. It is unclear if this limitation refers back to the antecedent basis in the claim or in parent claim 1. Further claim 6 recites “converting, by 
In claim 7, lines 4-5, the limitation “said articulatory classes and subclasses information pre-recorded in said multilingual speech-based storage system” lacks proper antecedent basis in the claim. The limitation will be interpreted as ‘[[said]] articulatory classes and subclasses information

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Mantena et al. ("Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014) in view of Ajmera et al. (US PGPub 2013/0007035).
Claim 1:
 Mantena discloses a method for performing at least keyword data search (Abstract), the method comprising the steps of: 
inputting said keyword data, wherein said keyword includes plurality of multilingual text keywords and/or audio keywords (“Let Q = {q1, q2,..., qi,..., qn} be a spoken query (or query) containing n feature vector”, section 3, paragraph 1); 
converting said input keyword data to obtain plurality of articulatory classes and subclasses information (“Each of these feature vectors represent a Gaussian, articulatory or phone posteriorgrams as computed in Sections 4 and 5”, section 3, paragraph 1), wherein said device includes a multilingual speech-based storage system having a plurality of records of information associated with said articulatory classes and subclasses (“Let R = {u1, u2,..., uj,..., um} be the spoken audio (or reference) containing m feature vectors. Each of these feature vectors represent a Gaussian, articulatory or phone posteriorgrams as computed in Sections 4 and 5”, section 3, paragraph 1, see section 2, paragraph 1 for multilingual speech-based storage system, see section 4, paragraph 2 for classes and sub-classes); 
matching said articulatory classes and subclasses information obtained from said input keyword data with said plurality of records to obtain a result (“The distance measure between a query vector qi and a reference vector uj is given … We define the term search hit as the region in the reference R that is likely to contain the query Q”, section 3, paragraph 1). 
Mantena does not explicitly disclose the data keyword search comprises performing at least a multimodal keyword data search by using an electronic device.
In a similar data keyword search method, Ajmera discloses performing at least a multimodal keyword data by using an electronic device (query being in a form of at least one of: text and audio) search by using an electronic device (“an apparatus comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to accept a search query in a first language variety, the search query being in a form of at least one of: text and audio”, [0006]).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to have combined the references to yield the predictable result performing Mantena’s keyword data search as a multimodal search in order to provide the user different ways of inputting data such as speech and/or text.
Claim 2:
Mantena in view of Ajmera discloses the method as claimed in claim 1, but doesn’t explicitly disclose wherein said electronic device is selected from a client device or a server device or any combinations thereof (Ajmera, [0019], see also [0049]). 
Claim 3:

Claim 4:
Mantena in view of Ajmera discloses the method as claimed in claim 2, wherein before converting said input keyword to obtain plurality of articulatory information, pre-training said electronic device to detect articulatory classes and subclasses information associated with said input keyword (Mantena, Introduction, paragraph 5, see also section 5, paragraph 2). 
Claim 5 (appears intended to further limit multilingual text keywords which was claimed in the alternative and was not used to teach parent claim 1):
Mantena in view of Ajmera discloses the method as claimed in claim 1, further comprising converting, by said electronic device, said multilingual text keywords of said keyword data into a sequence of phonemes, and converting said sequence of phonemes into information associated with said articulatory classes and subclasses (Ajmera, “query (either in spoken form or text form) is represented as a phonetic lattice”, [0036]). 
Claim 7:

Claim 8:
Mantena discloses a system to perform at least a keyword data search, wherein the system is:
adapted to receive at least an input corresponding to said keyword includes plurality of multilingual text keywords and/or audio keywords (“Let Q = {q1, q2,..., qi,..., qn} be a spoken query (or query) containing n feature vector”, section 3, paragraph 1); 
convert said keyword received from said client device to obtain plurality of articulatory information (“Each of these feature vectors represent a Gaussian, articulatory or phone posteriorgrams as computed in Sections 4 and 5”, section 3, paragraph 1); and 
convert audio data containing multilingual speech recording into a plurality of records having data associated with articulatory classes and sub-classes information; a multilingual speech-based storage system recording said plurality of records having data associated with articulatory classes and sub-classes information (“Let R = {u1, u2,..., uj,..., um} be the spoken audio (or reference) containing m feature vectors. Each of these feature vectors represent a Gaussian, articulatory or phone posteriorgrams as computed in Sections 4 and 5”, section 3, paragraph 1, see section 2, paragraph 1 for 
adapted to perform matching of articulatory information associated with said keyword with said plurality of records to generate a result (“The distance measure between a query vector qi and a reference vector uj is given … We define the term search hit as the region in the reference R that is likely to contain the query Q”, section 3, paragraph 1). 
Mantena does not explicitly disclose wherein the system configured to perform multimodal keyword data search comprises a client device, a server device communicably coupled to said client device and a processor module; wherein said client device comprises an input receiving device for receiving the input, the server device comprises conversion models for converting the keyword and the processor module performs the matching.
In a similar data keyword search method, Ajmera discloses performing at least a multimodal keyword data by using a system (query being in a form of at least one of: text and audio) search by using an electronic device (“an apparatus comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to accept a search query in a first language variety, the search query being in a form of at least one of: text and audio”, [0006]). 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to have combined the references to yield 
Ajmera further discloses the system configured to perform multimodal keyword data search comprises a client device, a server device communicably coupled to said client device and a processor module; wherein said client device comprises an input receiving device for receiving the input, the server device comprises conversion models for converting the keyword and the processor module performs a matching (“Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system /server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices”, [0018], see also “Computer system/server 12 may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network”, [0019], see also [0039] and [0025])

Claim 9:
Mantena in view of Ajmera discloses the system as claimed in claim 8, wherein said client device further comprises a conversion module, adapted to convert said input keyword data to obtain plurality of articulatory information (Mantena, section 3, paragraph 1).

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAMUEL G NEWAY/Primary Examiner, Art Unit 2657