Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action


1.	The Examiner acknowledges the applicant’s amendment filed February 8, 2021.  At this point claims 1-25 are pending in the instant application and ready for examination by the Examiner.

2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 8, 2021 has been entered.


Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing 

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim(s) 1-4, 8-9, 11-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gao in view of Lynch, in view of Liu and further in view of Seltzer. (U. S. Patent Publication 20150363688, referred to as Gao; U. S. Patent 7587374, referred to as Lynch; U. S. Patent 9147129, referred to as Liu; U. S. Patent Publication 20130253930, referred to as Seltzer)

Claim 1
Gao discloses a computer implemented method for adapting a model for recognition processing to a target-domain, performed by a computer device, the method comprising: preparing a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model. (Gao, abstract, 0084; ‘source documents’ and ‘In various tested implementations, deep semantic models were trained on a training corpus such as the data sets described above (e.g., Wikipedia.RTM. page browsing events or the like), ….’of Gao)

Lynch discloses obtaining a second distribution in relation to the part of the model by using data of the target-domain. (Lynch, fig 2, c9:55-67; ‘The Mean-Field BDRA is then retrained for each point in a target data set and training errors are calculated for each training operation.’ and item 34 in figure 2 of Lynch.  EC: Lynch has an initial training (item 22, fig 2) obtains a second set of data (item 34 figure 2) and then retrains. (item 36)) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao and Lynch before him before the effective filing date of the claimed invention, to modify Gao to incorporate a testing or retraining data set of Lynch. Given the advantage of being able to refine the model without the bias of the previous data set, one having ordinary skill in the art would have been motivated to make this obvious modification.
Gao and Lynch do not disclose expressly tuning one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small.
Liu discloses tuning one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small. (Liu, c14:57-66; In summary, a stacked learning framework can be employed to re-use base-level training data for meta-level learning. The problem can be addressed as a knowledge transfer, and can include first applying a histogram to re -balancing to 
Gao, Lynch and Liu do not disclose expressly wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning.
Seltzer discloses wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning. (Seltzer, 0067; The cluster component 402 can employ unsupervised clustering to characterize environments associated with the utterances 202. For example, the cluster component 402 can use a Gaussian mixture model trained on silence regions of the utterances 202 in the training data or adaptation data. EC: A cluster is viewed as a region.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu and Seltzer 

Claim 2
Gao discloses wherein the model includes a neural network having an input layer and a plurality of layers on top of the input layer, the part being one or more lower layers among the plurality of the layers and the input layer, the first and second distributions being output distributions from the part by feeding the data into the input layer from the training-domain and the target-domain, respectively. (Gao, fig 5; The input layer is the step between items 520 and 530. Plurality of layers is item 540. First and second distributions maps to item 520 and its function. Training and target maps to context and focus of Gao.)

Claim 3
Gao discloses wherein the part of the model is one or more lowest layers among the plurality of the layers and the input layer. (Gao, fig 5; Wherein the part of the model is one or more lowest layers among the plurality of the layers and the input layer (assuming a neural network input layer) maps to the beginning of item 530 of Gao.)

Claim 4
Gao, 0007; Wherein the part includes a convolutional layer and a subsampling layer on top of the convolutional layer of applicant maps to ‘Each of these contexts is then mapped to a separate vector. Each vector is then mapped to a convolutional layer of a deep neural network or the like.’ of Gao.), the first and second distributions being output distributions from the subsampling layer. (Gao, 0041; The first and second distributions being output distributions from the subsampling layer of applicant maps to the function or result of the pair extraction module. This is the initial starting point of the context and focus distributions.)
 
Claim 8
Gao discloses wherein the data of the training-domain and the data of the target-domain are both split into a plurality of classes by utilizing supervised information (Gao, 0104; The training-domain and the data of the target-domain are both split into a plurality of classes by utilizing supervised information of applicant maps to ‘In contrast, the DSM learned by the Interestingness Modeler represents documents as points in a hidden semantic space using a supervised learning method, i.e., paired documents are closer in that latent space than unpaired ones.’ of Gao.), the plurality of the classes including each class representing a phone, a group of phones, or a group of multi-phones, both the first and the second distributions including a distribution for each class (Gao, 0126; The plurality of the classes including each class representing a phone, a group of phones, or a group of multi-phones, both the first and the second distributions including a distribution for each class of applicant maps to ‘Examples of such devices Gao, 0080; Calculating at least one difference between the first and the second distributions for each class of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, t.sub.1) and (s, t.sub.2), where the former is preferred and typically has a higher interestingness score. Let .DELTA. be the difference of their interestingness scores, following Equation 1.’ of Gao.); and combining the at least one calculated difference over the plurality of the classes. (Gao, 0112; Combining the at least one calculated difference over the plurality of the classes of applicant maps to ‘Thus, assuming mapping to the first anchor, this can be formally stated as follows. Let A.sub.s be the set of all anchors in s and let t.sub.a be the target document linked to by anchor a.di-elect cons.A.sub.s. The Interestingness Modeler then selects the k anchors in A.sub.s that maximize the cumulative interest, according to: arg max A s k = ( a 1 , , a k a i .di-elect cons. A s ) a i .di-elect cons. A s k .sigma. ( s , t a i ) Equation ( 10 ) ##EQU00003##  where .sigma.(s,t.sub.a)=0 for all aA.sub.s.’ of Gao.)


Gao discloses wherein the obtaining the second distribution and the tuning the one or more parameters are iterated until the difference meets a predetermined condition. (Gao, 0088; Wherein the obtaining the second distribution and the tuning the one or more parameters are iterated until the difference meets a predetermined condition of applicant maps to ‘For example, assuming an initial value of .eta.=1.0, after each epoch (i.e., a pass over the entire training data), the learning rate is adjusted as .eta.=0.5.times..eta. (or any other desired weight) if the loss on validation data is not reduced. The training stops if either .eta. is smaller than a preset threshold or the loss on training data can no longer be reduced significantly. In various tested implementations, it was observed that DSM training typically converges within about 20 epochs.’ of Gao.)

Claim 11
Gao discloses performing an additional training to the tuned model by using training data with a label from the target-domain in a supervised manner. (Gao, 0104; Performing an additional training to the tuned model by using training data with a label from the target-domain in a supervised manner of applicant maps to ‘In contrast, the DSM learned by the Interestingness Modeler represents documents as points in a hidden semantic space using a supervised learning method, i.e., paired documents are closer in that latent space than unpaired ones.’ of Gao.)

Claim 12
Gao, 0058; Wherein the adapted model provides an acoustic model for speech recognition processing of applicant maps to ‘The Interestingness Modeler provides a DSM derived from a deep neural network with convolutional structure that is highly effective for speech and image tasks.’ of Gao.)

Claim 13
Gao discloses wherein the preparing, the obtaining and the tuning are performed in a cloud computing environment. (Gao, 0134; Wherein the preparing, the obtaining and the tuning are performed in a cloud computing environment of applicant maps to ‘The implementations described herein may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks.’ of Cloud.)

Claim 14
Gao discloses wherein the preparing, the obtaining and the tuning are performed by one or more computer devices. (Gao, 0127; Wherein the preparing, the obtaining and the tuning are performed by one or more computer devices of applicant maps to ‘Further, the computing device 600 may also include optional system firmware 625 (or other firmware or processor accessible memory or storage) for use in implementing various implementations of the Interestingness Modeler.’ of Gao.)


Gao discloses a computer system for adapting a model for recognition processing to a target-domain, by executing program instructions, the computer system comprising: a memory tangibly storing the program instructions; a processor in communications with the memory, wherein the computer system is configured to (Gao, 0127; A memory tangibly storing the program instructions; a processor in communications with the memory, wherein the computer system is configured to of applicant maps to ‘Further, the computing device 600 may also include optional system firmware 625 (or other firmware or processor accessible memory or storage) for use in implementing various implementations of the Interestingness Modeler.’ of Gao.): prepare a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model. (Gao, abstract, 0084; Prepare a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model of applicant maps to ‘source documents’ and ‘In various tested implementations, deep semantic models were trained on a training corpus such as the data sets described above (e.g., Wikipedia.RTM. page browsing events or the like),….’ of Gao)
Gao does not disclose expressly obtain a second distribution in relation to the part of the model by using data of the target-domain.
Lynch discloses obtain a second distribution in relation to the part of the model by using data of the target-domain. (Lynch, fig 2, c9:55-67; Obtain a second distribution in relation to the part of the model by using data of the target-domain of applicant maps to ‘The Mean-Field BDRA is then retrained for each point in a target data set and 
Gao and Lynch do not disclose expressly tune one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small.
Liu discloses tune one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small. (Liu, c14:57-66; In summary, a stacked learning framework can be employed to re-use base-level training data for meta-level learning. The problem can be addressed as a knowledge transfer, and can include first applying a histogram to re -balancing to the marginal distribution of source-domain features (e.g., base-classifier score output on held-in data) according to target-domain features (e.g., score output on held-out data). From there, an adaptation of the TriAdaBoost algorithm can be used, such as with a weighted least-square fusion learner, such as for training the meta-level score fusion.) It would have been obvious to one having ordinary skill in the art, having the teachings of 
Gao, Lynch and Liu do not disclose expressly wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning.
Seltzer discloses wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning. (Seltzer, 0067; The cluster component 402 can employ unsupervised clustering to characterize environments associated with the utterances 202. For example, the cluster component 402 can use a Gaussian mixture model trained on silence regions of the utterances 202 in the training data or adaptation data. EC: A cluster is viewed as a region.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu and Seltzer before him before the effective filing date of the claimed invention, to modify Gao, Lynch and Liu to incorporate balancing output distributions of a classifier of Seltzer. Given the advantage of avoiding bias within the submitted data, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Gao discloses wherein the model includes a neural network having an input layer and a plurality of layers on top of the input layer, the part being one or more lower layers among the plurality of the layers and the input layer, the first and second distributions being output distributions from the part by feeding the data into the input layer from the training-domain and the target-domain, respectively. (Gao, fig 5; The input layer is the step between items 520 and 530. Plurality of layers is item 540. First and second distributions maps to item 520 and its function. Training and target maps to context and focus of Gao.)

Claim 17
Gao discloses wherein the data of the training-domain and the data of the target-domain both are split into a plurality of classes, both the first and the second distributions including a distribution for each class (Gao, fig 1; The data of the training-domain and the data of the target-domain both are split into a plurality of classes, both the first and the second distributions including a distribution for each class of applicant maps to identifying pairs of source and target documents of Gao.), the computer system being further configured to: calculate at least one difference between the first and the second distributions for each class (Gao, 0080; Calculating at least one difference between the first and the second distributions for each class of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, Gao, 0112; Combine the at least one calculated difference over the plurality of the classes of applicant maps to ‘Thus, assuming mapping to the first anchor, this can be formally stated as follows. Let A.sub.s be the set of all anchors in s and let t.sub.a be the target document linked to by anchor a.di-elect cons.A.sub.s. The Interestingness Modeler then selects the k anchors in A.sub.s that maximize the cumulative interest, according to: arg max A s k = ( a 1 , , a k a i .di-elect cons. A s ) a i .di-elect cons. A s k .sigma. ( s , t a i ) Equation ( 10 ) ##EQU00003##  where .sigma.(s,t.sub.a)=0 for all aA.sub.s.’ of Gao.)

Claim 18
Gao discloses determine whether the difference meets a predetermined condition; and obtain the second distribution and tune the one or more parameters repeatedly in response to determining that the difference does not meet the predetermined condition. (Gao, 0088; Determine whether the difference meets a predetermined condition; and obtain the second distribution and tune the one or more parameters repeatedly in response to determining that the difference does not meet the predetermined condition of applicant maps to ‘For example, assuming an initial value of .eta.=1.0, after each epoch (i.e., a pass over the entire training data), the learning rate is adjusted as .eta.=0.5.times..eta. (or any other desired weight) if the loss on validation data is not reduced. The training stops if either .eta. is smaller than a preset threshold or 

Claim 19
Gao discloses a computer program product for adapting a model for recognition processing to a target-domain, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: preparing a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model. (Gao, abstract, 0084; Preparing a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model of applicant maps to ‘source documents’ and ‘In various tested implementations, deep semantic models were trained on a training corpus such as the data sets described above (e.g., Wikipedia.RTM. page browsing events or the like), ….’ of Gao)
Gao does not disclose expressly obtaining a second distribution in relation to the part of the model by using data of the target-domain.
Lynch discloses obtaining a second distribution in relation to the part of the model by using data of the target-domain. (Lynch, fig 2, c9:55-67; Obtaining a second distribution in relation to the part of the model by using data of the target-domain of applicant maps to ‘The Mean-Field BDRA is then retrained for each point in a target data set and training errors are calculated for each training operation.’ and item 34 in 
Gao and Lynch do not disclose expressly tuning one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small.
Liu discloses tuning one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small. (Liu, c14:57-66; In summary, a stacked learning framework can be employed to re-use base-level training data for meta-level learning. The problem can be addressed as a knowledge transfer, and can include first applying a histogram to re -balancing to the marginal distribution of source-domain features (e.g., base-classifier score output on held-in data) according to target-domain features (e.g., score output on held-out data). From there, an adaptation of the TriAdaBoost algorithm can be used, such as with a weighted least-square fusion learner, such as for training the meta-level score fusion.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao and Lynch and Liu before him before the effective filing date of the claimed 
Gao, Lynch and Liu do not disclose expressly wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning.
Seltzer discloses wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning. (Seltzer, 0067; The cluster component 402 can employ unsupervised clustering to characterize environments associated with the utterances 202. For example, the cluster component 402 can use a Gaussian mixture model trained on silence regions of the utterances 202 in the training data or adaptation data. EC: A cluster is viewed as a region.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu and Seltzer before him before the effective filing date of the claimed invention, to modify Gao, Lynch and Liu to incorporate balancing output distributions of a classifier of Seltzer. Given the advantage of avoiding bias within the submitted data, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 20
Gao, fig 5; The input layer is the step between items 520 and 530. Plurality of layers is item 540. First and second distributions maps to item 520 and its function. Training and target maps to context and focus of Gao.)

Claim 21
Gao discloses wherein the data of the training-domain and the data of the target-domain both are split into a plurality of classes, both the first and the second distributions including a distribution for each class (Gao, fig 1; The training-domain and the data of the target-domain both are split into a plurality of classes, both the first and the second distributions including a distribution for each class of applicant maps to identifying pairs of source and target documents of Gao.), the tuning including: calculating at least one difference between the first and the second distributions for each class (Gao, 0080; Calculating at least one difference between the first and the second distributions for each class of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, t.sub.1) and (s, t.sub.2), where the former is preferred and typically has a higher interestingness score. Let .DELTA. be the Gao, 0112; Combining the at least one calculated difference over the plurality of the classes of applicant maps to ‘Thus, assuming mapping to the first anchor, this can be formally stated as follows. Let A.sub.s be the set of all anchors in s and let t.sub.a be the target document linked to by anchor a.di-elect cons.A.sub.s. The Interestingness Modeler then selects the k anchors in A.sub.s that maximize the cumulative interest, according to: arg max A s k = ( a 1 , , a k a i .di-elect cons. A s ) a i .di-elect cons. A s k .sigma. ( s , t a i ) Equation ( 10 ) ##EQU00003##  where .sigma.(s,t.sub.a)=0 for all aA.sub.s.’ of Gao.)

Claim 22
Gao discloses a computer implemented method for adapting a neural network to a target-domain, performed by a processor, the method comprising: preparing a first output distribution from one or more lower layers of the neural network on a memory operably coupled to the processor (Gao, 0042; From one or more lower layers of the neural network on a memory operably coupled to the processor of applicant maps to ‘A DSM Training Module 140 then map context and optional focus of each document to separate vectors. This is done through a neural network, i.e., the context and optional focus are first fed into the input layer of the neural network,….’of Gao.), the first output distribution being derived from data of a training-domain for the neural network (Gao, abstract, 0084; Preparing a first output distribution …. the first output distribution being derived from data of a training-domain for the neural network of applicant maps to ‘source documents’ of Gao); …. from the one or more lower layers of the neural network Gao, 0042; From the one or more lower layers of the neural network by feeding data of the target-domain into the neural network of applicant maps to ‘A DSM Training Module 140 then map context and optional focus of each document to separate vectors. This is done through a neural network, i.e., the context and optional focus are first fed into the input layer of the neural network,….’and ‘In various tested implementations, deep semantic models were trained on a training corpus such as the data sets described above (e.g., Wikipedia.RTM. page browsing events or the like), ….’ of Gao.); and….
Gao does not disclose expressly calculating a second output distribution.
Lynch discloses calculating a second output distribution. (Lynch, fig 2, c9:55-67 Calculating a second output distribution of applicant maps to ‘The Mean-Field BDRA is then retrained for each point in a target data set and training errors are calculated for each training operation.’ and item 34 in figure 2 of Lynch.  EC: Lynch has an initial training (item 22, fig 2) obtains a second set of data (item 34 figure 2) and then retrains. (item 36)) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao and Lynch before him before the effective filing date of the claimed invention, to modify Gao to incorporate a testing or retraining data set of Lynch. Given the advantage of being able to refine the model without the bias of the previous data set, one having ordinary skill in the art would have been motivated to make this obvious modification.
Gao and Lynch do not disclose expressly tuning one or more parameters of the one or more lower layers of the neural network based on the data of the target domain and one or more output distributions from another part of the model by calculating a 
Liu discloses tuning one or more parameters of the one or more lower layers of the neural network based on the data of the target domain and one or more output distributions from another part of the model by calculating a change in the one or more parameters so as to minimize a difference between the first and the second output distributions based on the first and the second output distributions. (Liu, c14:57-66; In summary, a stacked learning framework can be employed to re-use base-level training data for meta-level learning. The problem can be addressed as a knowledge transfer, and can include first applying a histogram to re -balancing to the marginal distribution of source-domain features (e.g., base-classifier score output on held-in data) according to target-domain features (e.g., score output on held-out data). From there, an adaptation of the TriAdaBoost algorithm can be used, such as with a weighted least-square fusion learner, such as for training the meta-level score fusion.)  It would have been obvious to one having ordinary skill in the art, having the teachings of Gao and Lynch and Liu before him before the effective filing date of the claimed invention, to modify Gao and Lynch to incorporate balancing output distributions of a classifier of Liu. Given the advantage of avoiding bias within the submitted data, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Gao, Lynch and Liu do not disclose expressly wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the 
Seltzer discloses wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning. (Seltzer, 0067; The cluster component 402 can employ unsupervised clustering to characterize environments associated with the utterances 202. For example, the cluster component 402 can use a Gaussian mixture model trained on silence regions of the utterances 202 in the training data or adaptation data. EC: A cluster is viewed as a region.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu and Seltzer before him before the effective filing date of the claimed invention, to modify Gao, Lynch and Liu to incorporate balancing output distributions of a classifier of Seltzer. Given the advantage of avoiding bias within the submitted data, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Claim 23
Gao discloses wherein the one or more lower layers are lowest layers including a convolutional layer on an input layer of the neural network (Gao, 0007; Wherein the one or more lower layers are lowest layers including a convolutional layer on an input layer of the neural network of applicant maps to ‘Each of these contexts is then mapped to a separate vector. Each vector is then mapped to a convolutional layer of a deep neural network or the like.’ of Gao.) and a subsampling layer on top of the convolutional layer, Gao, 0041; A subsampling layer on top of the convolutional layer, the first and second output distributions being output distributions from the subsampling layer by feeding the data into the input layer from the training-domain and the target-domain, respectively of applicant maps to the function or result of the pair extraction module. This is the initial starting point of the context and focus distributions.)

Claim 24
Gao discloses a computer system for adapting a model for recognition processing to a target-domain, the computer system comprising: a preparing module configured to prepare a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model. (Gao, abstract, fig 1, 0084;  A preparing module configured to prepare a first distribution in relation to a part of the model, the first distribution being derived from data of a training-domain for the model of applicant maps to ‘source documents’ and ‘In various tested implementations, deep semantic models were trained on a training corpus such as the data sets described above (e.g., Wikipedia.RTM. page browsing events or the like),….’ of Gao. First distribution maps to the ‘context’ portion of the context and focus extraction module of Gao)
Gao does not disclose expressly an obtaining module configured to obtain a second distribution in relation to the part of the model by using data of the target-domain.
Lynch, fig 2, c9:55-67; An obtaining module configured to obtain a second distribution in relation to the part of the model by using data of the target-domain of applicant maps to ‘The Mean-Field BDRA is then retrained for each point in a target data set and training errors are calculated for each training operation.’ and item 34 in figure 2 of Lynch.  EC: Lynch has an initial training (item 22, fig 2) obtains a second set of data (item 34 figure 2) and then retrains. (item 36)) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao and Lynch before him before the effective filing date of the claimed invention, to modify Gao to incorporate a testing or retraining data set of Lynch. Given the advantage of being able to refine the model without the bias of the previous data set, one having ordinary skill in the art would have been motivated to make this obvious modification.
Gao and Lynch do not disclose expressly a tuning module configured to tune one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small.
Liu discloses a tuning module configured to tune one or more parameters of the part of the model based on the data of the target domain and one or more output distributions from another part of the model so that a difference between the first and the second distributions becomes small. (Liu, c14:57-66; In summary, a stacked learning framework can be employed to re-use base-level training data for meta-level learning. The problem can be addressed as a knowledge transfer, and can include first 
Gao, Lynch and Liu do not disclose expressly wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning.
Seltzer discloses wherein each of the target-domain and the training-domain data is split into utterance regions and silence regions and the utterance regions and the silence regions are both separately evaluated and results from each of the regions are combined for use in additional parameter tuning. (Seltzer, 0067; The cluster component 402 can employ unsupervised clustering to characterize environments associated with the utterances 202. For example, the cluster component 402 can use a Gaussian mixture model trained on silence regions of the utterances 202 in the training data or adaptation data. EC: A cluster is viewed as a region.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu and Seltzer 

Claim 25
Gao discloses a splitting module configured to split the data of the training-domain and the data of the target-domain into a plurality of classes (Gao, fig 1; A splitting module configured to split the data of the training-domain and the data of the target-domain into a plurality of classes of applicant maps to the ‘pair extraction module’ of Gao.); and/or an additional training module configured to perform an additional training to the tuned model by using training data with a label from the target-domain in a supervised manner. (Gao, fig 1; An additional training module configured to perform an additional training to the tuned model by using training data with a label from the target-domain in a supervised manner of applicant maps to the portion of the DSM training module which pertains to ‘minimize distance between vectors of interesting source and target documents of Gao.)


Claim(s) 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gao, Lynch, Liu and Seltzer as applied to claims 1-4, 8-9 and 11-25 above, and further in view of Choi. (U. S. Patent Publication 20080167863, referred to as Choi)


Gao discloses…. calculating at least one difference between the first and the second distributions for both the silence and utterance regions. (Gao, 0080; Calculating at least one difference between the first and the second distributions for both the silence and utterance regions of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, t.sub.1) and (s, t.sub.2), where the former is preferred and typically has a higher interestingness score. Let .DELTA. be the difference of their interestingness scores, following Equation 1.’ of Gao.)
Gao, Lynch, Liu and Seltzer do not disclose expressly wherein the data of the training-domain and the data of the target-domain both include silence and utterance regions, the tuning including.
Choi discloses wherein the data of the training-domain and the data of the target-domain both include silence and utterance regions, the tuning including. (Choi, fig 3; The data of the training-domain and the data of the target-domain both include silence and utterance regions of applicant maps to the voice signal separator module which generates unvoiced sound and voiced sound.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu, Seltzer and Choi before him before the effective filing date of the claimed invention, to modify Gao, Lynch, Liu and Seltzer to incorporate both the spoken word and silence of Choi. Given the advantage of the silence being an indication of a separation between one word and 

Claim 6
Gao discloses wherein the data of the training-domain and the data of the target-domain are both split into a plurality of classes in an unsupervised manner (Gao, 0104; The training-domain and the data of the target-domain are both split into a plurality of classes in an unsupervised manner of applicant maps to the example of ‘Various existing ranking techniques, such as bilingual topic models (BLTM), for example, use a generative model where semantic representation is a distribution of hidden semantic topics that is learned using Maximum Likelihood Estimation in an unsupervised manner, i.e., maximizing the log-likelihood of the source-target document pairs in the training data.’ of Gao.) …. calculating at least one difference between the first and the second distributions for each class (Gao, 0080; Calculating at least one difference between the first and the second distributions for each class of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, t.sub.1) and (s, t.sub.2), where the former is preferred and typically has a higher interestingness score. Let .DELTA. be the difference of their interestingness scores, following Equation 1.’ of Gao.); and combining the at least one calculated difference over the plurality of the classes. (Gao, 0112; Combining the at least one calculated difference over the plurality of the classes of applicant maps to ‘Thus, assuming mapping to the first anchor, this 
Gao, Lynch, Liu and Seltzer do not disclose expressly the plurality of classes including class representing utterance regions and class representing silence regions, both the first and the second distributions including a distribution for each class.
Choi discloses the plurality of classes including class representing utterance regions and class representing silence regions, both the first and the second distributions including a distribution for each class. (Choi, fig 3; The plurality of classes including class representing utterance regions and class representing silence regions, both the first and the second distributions including a distribution for each class of applicant maps to the voice signal separator module which generates unvoiced sound and voiced sound.) It would have been obvious to one having ordinary skill in the art, having the teachings of Gao, Lynch, Liu, Seltzer and Choi before him before the effective filing date of the claimed invention, to modify Gao, Lynch, Liu and Seltzer to incorporate both the spoken word and silence of Choi. Given the advantage of the silence being an indication of a separation between one word and another, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 7
Gao, 0104; In an unsupervised manner of applicant maps to the example of ‘Various existing ranking techniques, such as bilingual topic models (BLTM), for example, use a generative model where semantic representation is a distribution of hidden semantic topics that is learned using Maximum Likelihood Estimation in an unsupervised manner, i.e., maximizing the log-likelihood of the source-target document pairs in the training data.’ of Gao.), the tuning including: calculating at least one difference between the first and the second distributions for the utterance regions. (Gao, 0080; Calculating at least one difference between the first and the second distributions for the utterance regions of applicant maps to ‘For example, consider a source document s and two candidate target documents t.sub.1 and t.sub.2, where t.sub.1 is more interesting than t.sub.2 to a user when reading s. The Interestingness Modeler constructs two pairs of documents (s, t.sub.1) and (s, t.sub.2), where the former is preferred and typically has a higher interestingness score. Let .DELTA. be the difference of their interestingness scores, following Equation 1.’ of Gao.)
Gao, Lynch, Liu and Seltzer do not disclose expressly wherein the data of the training-domain and the data of the target-domain both include utterance regions split from whole data including the utterance regions and silence regions.
Choi discloses wherein the data of the training-domain and the data of the target-domain both include utterance regions split from whole data including the utterance regions and silence regions. (Choi, fig 3; Wherein the data of the training-domain and the data of the target-domain both include utterance regions split from whole data including the utterance regions and silence regions of applicant maps to the voice signal separator module which generates unvoiced sound and voiced sound.) It .


Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gao, Lynch, Liu and Seltzer as applied to claims 1-4, 8-9 and 11-25 above, and further in view of Mun. (U. S. Patent Publication 20150088873, referred to as Mun)

Claim 10
Gao, Lynch, Liu and Seltzer do not disclose expressly the difference is calculated by means of square error or cross-entropy and  a loss function is set using the means square error or the cross-entropy.
Choi discloses the difference is calculated by means of square error or cross-entropy and  a loss function is set using the means square error or the cross-entropy. (Mun, 0493; The difference is calculated by means of square error or cross-entropy and a loss function is set using the means square error or the cross-entropy of applicant maps to ‘Mean Squared Error (MSE) is an absolute error measure that squares the errors (the difference between the actual historical data and the forecast-fitted data predicted by the model) to keep the positive and negative errors from canceling each 


4.	Claims 1-25 are rejected.


Conclusion	
5.	The prior art of record and not relied upon is considered pertinent to the applicant’s disclosure.
	-Search terms: ‘target domain’, ‘target distribution’, ‘training domain’, ‘training distribution’ and ‘neural network’
	-U. S. Patent Publication 20150332670: Akbalak
	-U. S. Patent 8660973: Feigenbaum


Correspondence Information
6.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Li Zhen can be reached at (571) 272-3768.  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)









/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121