DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination (RCE) under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on August 12, 2022 has been entered by way of the RCE filed on August 26, 2022.

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed August 12, 2022 (herein “Amendment”), with respect to the rejections of claims 1 and 13, and claims depending therefrom under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Watanabe et al., "Topic tracking language model for speech recognition," Computer Speech & Language, Volume 25, Issue 2, 2011, Pages 440-461, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2010.07.006. (Year: 2011).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 8-9, 11-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al., (WO 2017/020011 A1, herein “Huang”), further in view of Watanabe et al., "Topic tracking language model for speech recognition," Computer Speech & Language, Volume 25, Issue 2, 2011, Pages 440-461, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2010.07.006 (herein “Watanabe”).
Regarding claim 1, Huang teaches a speech analysis method, comprising (Huang paras. [0080] and [0094], recording, analyzing and playback of audio data corresponding to a teleconference, the audio data including telephone data with instances of speech): 
dividing speech data into a plurality of segments (Huang paras. [0094]-[0095], data packets being analyzed by both a voice activity detector, and analysis module, where the voice activity detector includes a talkspurt number indicating an instance of speech (segment) (as determined from each time the voice activity detector determines that speech has recommenced after a period of non-speech) and as well each packet receives a timestamp); 
aligning the plurality of segments (Huang para. [0108], the data packet stream is re-ordered (aligning) according to the timestamp data) based on meta information of the plurality of segments (Huang paras. [0095]-[0098], packet trace files include receive timestamp, talkspurt number, and received sequence number (meta information), and the packet trace files are associated with more metadata including conference participant name and location); 
extracting a keyword list of each segment (Huang paras. [0233]-[0238], [0247], [0274] in topic analysis methods a word list is formed from analyzing speech recognition results to determine words spoken and their frequency of occurrence, where para. [0294] and fig. 21 teaches that the topic analysis is invoked separately for each segment of the conference, generating a topic list 2111 for each segment (hence also generating a word list too since fig. 25 illustrates the topic analysis to include generating a word list from which the topic list is generated)); 
modeling topic information of each segment (Huang para. [0136], topic analysis module analyzes the speech recognition results and identifies potential conference topics, where the topic analysis outputs the segment and word cloud data 309, which para. [0294] and fig. 25, disclose is performed for each segment) based on the keyword list (Huang para. [0248], the topic list is based on the word list); and
generating structured speech data (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata (that have been generated) are stored in the conference recording database 3 as data structures represented as tables),
wherein the modeling the topic information of each segment comprises: determining a topic probability distribution of each segment (Huang paras. [0170], [0264], [0275], [0284], and [0288]-[0291], fig. 24A, the results of the speech recognition are speech recognition lattices, where the topic analysis module determines a term frequency metric for words in the speech recognition lattice, and the word list is sorted in descending order of the term frequency (para. [0232] teaching that term frequency is the number of occurrences of a word in the speech recognition lattices), the word list being used to determine a weighted topic list from which a word cloud is generated, and arranged in the user interface at least according to a time-wise distribution, but also in a descending order of topic frequency); and 
of at least one keyword included in the keyword list based on the topic probability distribution (Huang para. [0275], the word list is sorted in descending order of the term frequency), and
wherein the structured speech data comprises the meta information (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata (that have been generated) are stored in the conference recording database 3 as data structures represented as tables).
Huang does not explicitly teach determining a topic probability value.
Huang further does not explicitly teach that the keyword list, the topic probability distribution of each segment, and the topic probability value of at least one keyword included in the keyword list based on the topic probability distribution, are part of the structured speech data.
Watanabe teaches determining a topic probability value (Watanabe pages 442-443, topic tracking language model including drawing a topic probability value                         
                            
                                
                                    φ
                                
                                
                                    t
                                
                            
                        
                     from the Dirichlet distribution).
Watanabe further teaches the keyword list, the topic probability distribution of each segment, and the topic probability value of at least one keyword included in the keyword list based on the topic probability distribution (Watanabe page 453, fig. 6 illustrating the topic probability distribution per chunk (segment) over 11 chunks, and table 4 illustrating the top 10 nouns (keywords in keyword list) for a particular topic in the top 3 topics (topics 26, 15 and 3) of the distribution shown in fig. 6 (based on the topic probability distribution), and pages 443-444 teaching the topic tracking language model drawing                         
                            
                                
                                    θ
                                
                                
                                    t
                                    k
                                
                            
                        
                     (word probability) for a topic k (topic probability value of keyword)).
Therefore, considering the teachings of Huang and Watanabe together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the word cloud generation including the storage of various structured data as disclosed in Huang to include the specific aspects of generating a topic tracking language model cited above and disclosed in Watanabe at least because doing so would allow for improved performance in lecture and conference presentation language model adaptation tasks requiring topic tracking (see Watanabe page 457, section 5).
Regarding claim 2, Huang teaches wherein the speech data comprises at least one piece of speech data acquired in at least one space (Huang paras. [0094]-[0096], packet trace files containing instances of speech from a conference participant at a participant location (at least one space)).
Regarding claim 3, Huang teaches wherein the dividing the speech data comprises dividing the speech data at a silent gap in the speech data (Huang para. [0094], the data packets processed with a voice activity detector that detects instances of speech and non-speech (silent gap in the speech data), and where a talkspurt number is included in data packets that include speech, the talkspurt being incremented each time the voice activity detector determines the speech has recommenced after a period of non-speech (thus the talkspurt number indicating a division of the speech data per “talkbursts” of speech)).
Regarding claim 4, Huang teaches wherein the meta information comprises start time information, duration information, place information, and speaker information of each segment (Huang paras. [0115] and [0119]-[0120], conference recording database containing data for a conference including a receive timestamp for data packet, meeting overview information including the time of a conference (start time information), names of participants (speaker information of each segment) and conference participant location, where para. [0129] further teaches that speaker diarization data includes which conference participant and when a conference participant spoke, and para. [0099] teaches that conference metadata includes a summary of who participated in the conference and for how long (duration)).
Regarding claim 5, Huang teaches wherein the aligning the plurality of segments comprises aligning the plurality of segments based on an alignment reference determined according to a type of the meta information (Huang paras. [0100], [0108], [0115], and [0122], uplink analysis module re-orders data packets of a packet trace file based on sequence numbers (alignment reference) which corresponds to a talkspurt number determined from the meta information being the packet trace file type metadata).
Regarding claim 6, Huang teaches wherein the meta information comprises start time information of each segment, and the aligning the plurality of segments based on the alignment reference comprises aligning the plurality of segments in chronological order of the start time information (Huang paras. [0107]-[0109], data packet stream includes time stamp data (meta information of the start time of the packet (segment)) which is used to re-order a packet if it is out of order).
Regarding claim 8, Huang teaches wherein the meta information comprises speaker information and start time information of each segment (Huang paras. [0096]-[0097], Fig. 2B, conference metadata including individual conference participants associated to the packet trace files of audio data when they were speaking), and the aligning the plurality of segments based on the alignment reference comprises (Huang fig. 24A, para. [0289]-[0290] the user interface presenting the data of a conference recording): classifying the plurality of segments according to the speaker information; and aligning the plurality of segments classified according to the speaker information by speaker in chronological order of the start time information (Huang paras. [0290] and [0303], fig. 24A, controlling a display to present the user interface presenting a list of conference participants of the conference recording, and the waveforms corresponding to conference participant speech in time intervals on a time line (chronological order)).
Regarding claim 9, Huang teaches wherein the extracting the keyword list comprises: converting each segment into a text (Huang para. [0132], speech recognition results as a text file are provided to the topic analysis module); and extracting, based on the text, at least one keyword included in the segment (Huang paras. [0132], [0136], [0235] and [0247], fig. 21, word list produced as a part of the topic generation comes from words that were speech recognized corresponding to an actual word spoken by a conference participant during the conference).
Regarding claim 11, Huang does not specifically disclose the limitations of claim 11. Watanabe teaches wherein the modeling the topic information of each segment comprises modeling the topic information by using latent Dirichlet allocation (LDA) (Watanabe page 449, fig. 3, LDA based topic model).
Therefore, considering the teachings of Huang and Watanabe together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the topic analysis as disclosed in Huang to use latent Dirichlet allocation as disclosed in Watanabe at least because doing so would allow for the modeling of topic probabilities with simpler parameter estimation (see Watanabe page 444).
Regarding claim 12, Huang teaches further comprising storing the structured speech data (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata are stored in the conference recording database 3 as data structures represented as tables, and analysis results from the analysis engine are stores in the analysis results database 5).
Regarding claim 13, Huang teaches a speech analysis apparatus, comprising (Huang fig. 1A, paras. [0057], [0059], [0070], and [0094], teleconferencing system that includes an analysis engine for analyzing uplink data packet streams which contain instances of speech, the application taking the form of a hardware and software embodiment): 
a memory configured to store speech data (Huang para. [0095], packet trace files including a data packet payload of data packet streams with voice, is sent to (stored) in the conference recording database 3, where para. [0069] discloses the database 3 as stored in a storage system, and para. [0057] teaches aspects of the present invention including non-transitory media (memory)); and
at least one processor accessible to the memory, wherein the processor is configured to (Huang paras. [0057],  fig. 1A, the various modules and engines being a hardware embodiment, where the modules are shown as connected to the database 3): 
divide the speech data into a plurality of segments (Huang paras. [0094]-[0095], data packets being analyzed by both a voice activity detector, and analysis module, where the voice activity detector includes a talkspurt number indicating an instance of speech (segment) (as determined from each time the voice activity detector determines that speech has recommenced after a period of non-speech) and as well each packet receives a timestamp); 
align the plurality of segments (Huang para. [0108], the data packet stream is re-ordered (aligning) according to the timestamp data) based on meta information of the plurality of segments (Huang paras. [0095]-[0098], packet trace files include receive timestamp, talkspurt number, and received sequence number (meta information), and the packet trace files are associated with more metadata including conference participant name and location); 
extract a keyword list of each segment (Huang paras. [0233]-[0238], [0247], [0274] in topic analysis methods a word list is formed from analyzing speech recognition results to determine words spoken and their frequency of occurrence, where para. [0294] and fig. 21 teaches that the topic analysis is invoked separately for each segment of the conference, generating a topic list 2111 for each segment (hence also generating a word list too since fig. 25 illustrates the topic analysis to include generating a word list from which the topic list is generated));
model topic information of each segment (Huang para. [0136], topic analysis module analyzes the speech recognition results and identifies potential conference topics, where the topic analysis outputs the segment and word cloud data 309, which para. [0294] and fig. 25, disclose is performed for each segment) based on the keyword list (Huang para. [0248], the topic list is based on the word list); and
generate structured speech data (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata (that have been generated) are stored in the conference recording database 3 as data structures represented as tables),
wherein the processor is further configured to determine a topic probability distribution of each segment (Huang paras. [0170], [0264], [0275], [0284], and [0288]-[0291], fig. 24A, the results of the speech recognition are speech recognition lattices, where the topic analysis module determines a term frequency metric for words in the speech recognition lattice, and the word list is sorted in descending order of the term frequency (para. [0232] teaching that term frequency is the number of occurrences of a word in the speech recognition lattices), the word list being used to determine a weighted topic list from which a word cloud is generated, and arranged in the user interface at least according to a time-wise distribution, but also in a descending order of topic frequency) and of at least one keyword included in the keyword list based on the topic probability distribution (Huang para. [0275], the word list is sorted in descending order of the term frequency), and
wherein the structured speech data comprises the meta information (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata (that have been generated) are stored in the conference recording database 3 as data structures represented as tables).
Although Huang teaches that the system shown in fig. 1A can be implemented in a combination of hardware and software that includes non-transitory media, Huang does not specifically teach that the hardware is a processor. However, using a processor as a hardware component in a signal processing system is well-known to one of ordinary skill in the art with predictable results, as well as the fact that the embodiments given in para. [0057] directed towards a hardware embodiment include a personal computer, also well-known to include a processor. Therefore, such a modification to Huang to include a processor specifically as the hardware would be Simple substitution of one known element for another to obtain predictable results (see MPEP 2143(I)(B).
Further, Huang does not explicitly teach determine a topic probability value for the modeling.
Huang further does not explicitly teach that the keyword list, the topic probability distribution of each segment, and the topic probability value of at least one keyword included in the keyword list based on the topic probability distribution, are part of the structured speech data.
Watanabe teaches determine a topic probability value for the modeling (Watanabe pages 442-443, topic tracking language model including drawing a topic probability value                         
                            
                                
                                    φ
                                
                                
                                    t
                                
                            
                        
                     from the Dirichlet distribution).
Watanabe further teaches the keyword list, the topic probability distribution of each segment, and the topic probability value of at least one keyword included in the keyword list based on the topic probability distribution (Watanabe page 453, fig. 6 illustrating the topic probability distribution per chunk (segment) over 11 chunks, and table 4 illustrating the top 10 nouns (keywords in keyword list) for a particular topic in the top 3 topics (topics 26, 15 and 3) of the distribution shown in fig. 6 (based on the topic probability distribution), and pages 443-444 teaching the topic tracking language model drawing                         
                            
                                
                                    θ
                                
                                
                                    t
                                    k
                                
                            
                        
                     (word probability) for a topic k (topic probability value of keyword)).
Therefore, considering the teachings of Huang and Watanabe together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the word cloud generation as disclosed in Huang including the storage of various structured data to include the specific aspects of generating a topic tracking language model cited above and disclosed in Watanabe at least because doing so would allow for improved performance in lecture and conference presentation language model adaptation tasks requiring topic tracking (see Watanabe page 457, section 5).
Regarding claim 14, Huang teaches wherein the speech data comprises at least one piece of speech data acquired in at least one space (Huang paras. [0094]-[0096], packet trace files containing instances of speech from a conference participant at a participant location (at least one space)).
Regarding claim 15, Huang teaches wherein the processor is further configured to divide the speech data comprises dividing the speech data at a silent gap in the speech data (Huang para. [0094], the data packets processed with a voice activity detector that detects instances of speech and non-speech (silent gap in the speech data), and where a talkspurt number is included in data packets that include speech, the talkspurt being incremented each time the voice activity detector determines the speech has recommenced after a period of non-speech (thus the talkspurt number indicating a division of the speech data per “talkbursts” of speech)).
Regarding claim 16, Huang teaches wherein the meta information comprises start time information, duration information, place information, and speaker information of each segment (Huang paras. [0115] and [0119]-[0120], conference recording database containing data for a conference including a receive timestamp for data packet, meeting overview information including the time of a conference (start time information), names of participants (speaker information of each segment) and conference participant location, where para. [0129] further teaches that speaker diarization data includes which conference participant and when a conference participant spoke, and para. [0099] teaches that conference metadata includes a summary of who participated in the conference and for how long (duration)).
Regarding claim 17, Huang teaches wherein the processor is further configured to convert each segment to a text (Huang para. [0132], speech recognition results as a text file are provided to the topic analysis module) and extract at least one keyword included in the segment based on the text, so as to extract the keyword list (Huang paras. [0132], [0136], [0235] and [0247], fig. 21, word list produced as a part of the topic generation comes from words that were speech recognized corresponding to an actual word spoken by a conference participant during the conference).
Regarding claim 19, Huang teaches wherein the processor is further configured to store structured speech data in the memory (Huang paras. [0095]-[0097] and [0070], packet trace files and conference metadata are stored in the conference recording database 3 as data structures represented as tables, and analysis results from the analysis engine are stores in the analysis results database 5).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Huang in view of Watanabe, as set forth above regarding claim 5 from which claim 7 depends, further in view of Schellenberg, “Principles of Arrangement,” Staff Information Paper Number 18, Published by the National Archives and Records Administration, Washington, DC, 14 pages, Web Version, 1999 (herein “Schellenberg NPL”).
Regarding claim 7, Huang teaches wherein the meta information comprises place information and start time information of each segment (Huang paras. [0115] and [0120], conference recording database containing data for a conference including a receive timestamp for data packet (start time information of each segment), and conference participant location), and the aligning the plurality of segments based on the alignment reference comprises (Huang paras. [0100], [0108], [0115], and [0122], uplink analysis module re-orders data packets of a packet trace file based on sequence numbers (alignment reference) which corresponds to a talkspurt number determined from the meta information being the packet trace file type metadata, where the alignment is shown in figure 24A, disclosed in paras. [0289]-[0292] to include an order of conference participant speech and the corresponding conference participant over a timeline):
classifying the plurality of segments according to the place information (Huang paras. [0115] and [0120], conference metadata including a conference participant location, where para. [0097] discloses the conference metadata is in a tabular form with a packet trace file name, thus at least through the conference metadata table, the packet traces (segments) are associated to (classifying) the conference metadata which includes the participant location).
Although Huang does teach in para. [0120] that the conference metadata includes conference participant location, and at least, a grouping of the audio speech data by conference participant (see Fig. 24A), or tabular form which associates the conference metadata to the packet traces (see para. [0097]) and thus the corresponding conference participant location, Huang does not explicitly teach aligning the plurality of segments classified according to the place information by place in chronological order of the start time information.
Schellenberg NPL teaches aligning the plurality of segments classified according to the place information by place (Schellenberg NPL “Arrangement of Subgroups” section, in arranging records (plurality of segments) for records management, subgroups (classified) of a record group are arranged geographically) in chronological order of the start time information (Schellenberg NPL “Arrangement by Series” section, within the subgroups (e.g. location subgroups), series are arranged in the chronological order in which the activities were instituted (thus according to start time information)).
Therefore, considering the teachings of Huang and Schellenberg NPL together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the conference metadata tabulation as disclosed in Huang to be subgrouped and then chronologically ordered within location subgroups as disclosed in Schellenberg NPL at least because doing so would be following a universally accepted practice in maintaining records, which serves to protect the integrity of records, make known the character and significance of records, and provide a workable and economical guide in arranging, describing and servicing records (see Schellenberg NPL “Basic Principle of Arrangement” section).

 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Kusumura et al., US 2015/0193425 A1, directed towards storage in a data structure of word latent topic information; and Anders et al., US 10387574 B1, directed towards storage of a data structure containing topic model information.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656