DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 10/31/2018.
Claims 1-20 are presented for examination.
Claim Rejections - 35 USC § 102
A person shall be entitle to a patent unless -
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 7-14, 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Rozantsev et al (Beyond Sharing Weights for Deep Domain Adaptation, herein Rozantsev).
Regarding claim 1,
Rozantsev teaches an apparatus, comprising: at least one processor; and at least one computer storage that is not a transitory signal and that comprises instructions executable by the at least one processor to: (Rozantsev, page 801, column 1, paragraph 1, line 7 “To this end, we introduce a two-stream architecture, where one operates in the source domain and the other in the target domain.” In other words, two-stream architecture is a machine learning method.  It is implicit that a machine learning method requires at least one processor and at least one non-transitory computer storage capability and instructions executable by the at least one processor in order to execute.)
	access a first neural network, the first neural network being associated with a first data type; access a second neural network, the second neural network being associated with a second data type different from the first data type; (Rozantsev, Fig. 1, and page 803, column 2, paragraph 2, line 5 “To implement this idea, we therefore introduce a two-stream architecture, such as the one depicted by Fig. 1.  The first stream operates on the source data, the second on the target one, and they are trained jointly.”  

    PNG
    media_image1.png
    503
    578
    media_image1.png
    Greyscale

In other words, first stream is first neural network, source data is the first data type, second stream is second neural network, target stream is the second data type, and (from fig. 1) the target data (real images) is different from the source data (synthetic images) is the second data type different from the first data type.)
provide, as input, first training data to the first neural network; provide, as input, second training data to the second neural network, the first training data being different from the second training data; (Rozantsev, page 803, column 2, paragraph 2, line 7 “The first stream operates on the source data, the second on the target one, and they are trained jointly.” In other words, source data (synthetic images) is first training data, first stream is first neural network, the target data (real images) is the second training data which is different from the first, the second stream is the second neural network, and the two streams of data are provided as input to the two neural networks, separately.)
	identify a first output from a first layer, the first layer being an output layer of the first neural network, the first output being based on the first training data; identify a second output from a second layer, the second layer being an output layer of the second neural network, the second output being based on the second training data; (Rozantsev, Fig.1,  “Our two-stream architecture.  One stream operates on the source data and the other on the target one.  Their weights are not shared.  Instead, we introduce loss functions that prevent corresponding weights from being too different from each other.” Examiner notes each layer outputs data which is used as input for the subsequent layer in the respective stream.  In other words, Fig. 1 shows a first output from a first layer of the first neural network based on the first training data, and a second output from a second layer of the second neural network based on the second training data.)
	based on the first and second outputs, determine a first adjustment to one or more weights of a third layer, the third layer being an intermediate layer of the second neural network; select the third layer and a fourth layer, the fourth layer being an intermediate layer of the first neural network, the third and fourth layers being parallel intermediate layers; (Rozantsev, Fig. 1, and, page 805, column 2, paragraph 2, line  1 “To learn the model parameters, we first pre-train the source stream using the source data only.  We then simultaneously optimize the weights of both streams according to the loss of Eqs. (2), (3), (4), and (5) using both source and target data, with the target stream weights initialized from the pre-trained source weights.” In other words, optimize weights of both streams is based on the first and second outputs, determine a first adjustment to one or more weights of the third layer, from Fig. 1, the layers are intermediate layers, and the third and fourth layers are parallel intermediate layers.)
	compare a third output from the third layer to a fourth output from the fourth layer, the third and fourth outputs being respective outputs of the respective third and fourth layers prior to the third and fourth outputs being respectively provided to subsequent respective layers of the respective neural networks, the third and fourth outputs being respectively based on the second and first training data; (Rozantsev, Fig. 1, see prior mapping. And, page 804, column 1, paragraph 2, line 1 “While our goal is to go beyond sharing the layer weights, we still believe that corresponding weights in the two streams should be related.  This models the fact that the source and target domains are related, and prevents overfitting in the target stream, when only very few labeled samples are available.  Our weight regularizer rw(.)  therefore represents the distance between the source and target weights in a particular layer. In principle, we could take it to directly act on the difference of those weights, and thus write

    PNG
    media_image2.png
    56
    477
    media_image2.png
    Greyscale
 This, however, would not truly attempt to model the domain shift, for instance to account for different means and ranges of values in the two types of data.  To better model the shift and introduce more flexibility in our model, we therefore propose not to penalize linear transformations between the source and target weights.  We then write our regularizer either by relying on the L2 norm as 
    PNG
    media_image3.png
    52
    462
    media_image3.png
    Greyscale
  or in an exponential form as 
    PNG
    media_image4.png
    48
    511
    media_image4.png
    Greyscale
 In both cases, aj and bj are scalar parameters that are different for each layer j ϵ Ω and learned at training time along with all other network parameters” In other words, the weight regularizer compares the weights of the third and fourth layers which are parallel to each other and belong to the second and first neural networks, respectively, and their outputs are based respectively on the second and first training data.)
	based on the comparison, determine a second adjustment to the one or more weights of the third layer; and adjust the one or more weights of the third layer based on consideration of both the first adjustment and the second adjustment. (Rozantsev, Fig. 1, see prior mapping. In other words, the first adjustment and second adjustment is done through normal backpropagation through the respective neural networks.  Then, the regularization step determines a second adjustment to the one or more weights of the third layer based on a comparison of the weights at the parallel levels, then, makes the adjustment based on the prior adjustment of each neural network and the new comparison, this is adjusting the one or more weights of the third layer based on consideration of both the first adjustment and the second adjustment.)
Regarding claim 2,
	Rozantsev teaches the apparatus of claim 1, wherein 
	the second neural network is established by a copy of the first neural network prior to the second training data being provided to the second neural network. (Rozantsev, Fig. 1, and, page 805, column 2, paragraph 2, line  1 “To learn the model parameters, we first pre-train the source stream using the source data only.  We then simultaneously optimize the weights of both streams according to the loss of Eqs. (2), (3), (4), and (5) using both source and target data, with the target stream weights initialized from the pre-trained source weights.” The two-stream architecture starts with identical neural networks. Then the source and target stream weights are initialized to the pre-trained source weights. This is the second neural network is established by a copy of the first neural network prior to the training data being provided to the second neural network.)
Regarding claim 3,
	Rozantsev teaches the apparatus of Claim 1, wherein
	the third and fourth layers are layers other than output layers. (Rozantsev, Fig. 1.  Any  of the sets of parallel layers other than the final layers are layers other than output layers.)
Regarding claim 4,
	Rozantsev teaches the apparatus of Claim 3, wherein 
	the third and fourth layers are intermediate hidden layers of the respective neural networks. (Rozantsev, Fig. 1. The parallel convolutional layers are intermediate hidden layers of the respective neural networks.)
Regarding claim 5,
	Rozantsev teaches the apparatus of Claim 1, wherein 
	the first training data is related to the second training data.  (Rozantsev, Fig. 1. The source training set is synthetic images, and the target training set are real images. They are both sets of images which is the first training data is related to the second training data.)
Regarding claim 7,
	Rozantsev teaches the apparatus of Claim 5, wherein 
	the first and second neural networks pertain to object recognition, and wherein the first training data is related to the second training data in that the first and second training data both pertain to a same object. (Rozantsev, paragraph 1, line 8, We demonstrate that this both yields higher accuracy than state-of-the art methods on several object recognition and detection tasks and consistently outperform networks with shared weights in both supervised and unsupervised settings.” In other words, the first and second neural networks pertain to object recognition and the first training data is related to the second training data in that the first and second training data both pertain to the same object.)
Regarding claim 8,
	Rozantsev teaches the apparatus of Claim 1, wherein 
	the instructions are executable by the at least one processor to: compare the third output to the fourth output to determine the similarity of the third output to the fourth output, the similarity evaluated using a first function. (Rozantsev, page 804, column 2, paragraph 4, line 1 “Maximum Mean Discrepancy. As the name suggests, given two sets of data, the MMD measures the distance between the mean of the two sets after mapping each sample to a Reproducing Kernel Hilbert Space (RKHS). In our context, let 
    PNG
    media_image5.png
    30
    145
    media_image5.png
    Greyscale
 be the feature representation at the last layer of the source stream, and 
    PNG
    media_image6.png
    34
    143
    media_image6.png
    Greyscale
 of the target stream. ” In other words,  
    PNG
    media_image6.png
    34
    143
    media_image6.png
    Greyscale
 is a first function used to compare the similarity of the third output to the fourth output. )

Regarding claim 9,
	Rozantsev teaches the apparatus of Claim 8, wherein 
	the determination of the first adjustment to the one or more weights of the third layer is based on a second function different from the first function.  (Rozantsev,  see mapping for claim 8,  In other words,  
    PNG
    media_image5.png
    30
    145
    media_image5.png
    Greyscale
 is a second function different from the first function.)
Regarding claim 10,
	Rozantsev teaches the apparatus of Claim 9, wherein 
	the first and second functions are discrepancy functions.  (Rozantsev,  see mapping of claims 8 and 9, and page 804, paragraph 4, line 6 “The squared MMD between the source and target domains can be expressed as
    PNG
    media_image7.png
    84
    554
    media_image7.png
    Greyscale
  where ф(.) denotes the mapping to RKHS.”  In other words,  
    PNG
    media_image6.png
    34
    143
    media_image6.png
    Greyscale
and 
    PNG
    media_image5.png
    30
    145
    media_image5.png
    Greyscale
 are the first and second discrepancy functions for the target stream and the source stream, respectively, that are used to calculate the Maximum Mean Discrepancy (MMD).)
Claim 11 is a method claim corresponding to apparatus claim 1.  Otherwise, they are the same.  It is implicit that a computer implemented method requires at least one processor and at least one computer storage that is not a transitory signal in order to execute.  Therefore, claim 11 is rejected for the same reasons as claim 1.
Regarding claim 12,
	Rozantsev teaches the method of Claim 11, wherein 
	the one or more weights of the third layer are adjusted by adding together the first adjustment and the second adjustment, the first and second adjustments both pertaining to weight changes.  (Rozantsev, Fig. 1, and, see mapping of claim 10.  In other words, loss function is determine adjustment of weights, after the loss functions are determined for both the source network and the target network, the adjustments are compared by the regularization step, in Fig. 1. This is compare the third output from the third layer to the fourth output from the fourth layer, and loss function is the first and second adjustments both pertaining to weight changes.)
Regarding claim 13,
	Rozantsev teaches the method of Claim 11, comprising: 
	determining the first adjustment to one or more weights of the third layer using a first loss function; and comparing the third output to the fourth output using a second loss function different from the first loss function to determine the second adjustment.  (Rozantsev, Fig. 1, see prior mapping.  In other words, loss function is determine adjustment to one or more weights of the third layer using a loss function, after the loss functions are determined for both the source network and the target network, the adjustments are compared by the regularization step, in Fig. 1. This is compare the third output from the third layer to the fourth output from the fourth layer using a second loss function.)
Regarding claim 14,
	Rozantsev teaches an apparatus, comprising: 
	at least one computer storage that is not a transitory signal and that comprises instructions executable by at least one processor to: access a first domain, the first domain being associated with a first domain genre; access a second domain, the second domain being associated with a second domain genre different from the first domain genre; using training data provided to the first and second domains, classify a target data set; and output a classification of the target data set.  (Rozantsev, Fig. 1, and page 1, paragraph 1, line 1 “The performance of a classifier trained on data coming from a specific domain typically degrades when applied to a related but different one. While annotating many samples from the new domain would address this issue, it is often too expensive or impractical.  Domain Adaptation has therefore emerged as a solution to this problem;” Examiner notes, computer storage, executable instructions, and at least one processor have been previously mapped. See mapping of claim 1. In other words, the first domain genre is synthesized images, the second domain genre is real images, synthesized images are different from real images is the second domain being associated with a second domain genre different from the first domain genre, and the two-stream architecture is a domain adaptation module (from mapping of claim 1) which is a classifier that outputs a classification of the target data set.)
Regarding claim 18,
	Rozantsev teaches the apparatus of Claim 14, wherein 
	the target data set is classified at least in part based on execution of a domain adaptation module established at least in part by a loss function.  (Rozantsev, Fig.1, and page 801, column 2, paragraph 3, line 2 “To this end, we introduce the two-stream architecture depicted by Fig. 1.” And, page 801, column 2, paragraph 3, line 6 “To nonetheless encode the fact that both streams tackle the same recognition problem, albeit in different domains, we introduce a loss function that relates the corresponding weights in both layers.” In other words, from Fig. 1, the target data set is classified, the two-stream architecture is a domain adaptation module, and loss function that relates the corresponding weights is at least in part by a loss function.)
Regarding claim 19,
	Rozantsev teaches the apparatus of Claim 14, wherein 
	the target data set is classified by a domain adaptation module receiving input from multiple output points from the first and second domains of training data.  (Rozantsev, Fig. 3, and, page 805, column 2, paragraph 5, line 5”We then demonstrate that it generalizes well to other classification problems by testing it on the Office, MNIST+USPS  and MNIST+SVHN datasets.”

    PNG
    media_image8.png
    433
    1174
    media_image8.png
    Greyscale

In other words, two-stream architecture is a domain adaptation module, and testing it on Office, MNIST+USPS  and MNIST+SVHN datasets is target data set is classified on input from multiple output points from the first and second domains of the training data. See Fig. 3 for a depiction of two domain genres (synthetic and real) of training data and the test dataset for classification of the UAV dataset.)
Regarding claim 20,
	Rozantsev teaches the apparatus of Claim 19, wherein 
	the domain adaptation module uses a discrepancy function to calculate a distance of overall data distribution between source and target data.  (Rozantsev, See mapping of claim 10. In other words, two stream architecture is domain adaptation module and Maximum Mean Discrepancy (MMD) is discrepancy function that calculates a distance of overall data distribution between source and target data.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Rozantsev, and Jamal et al (Deep Domain Adaptation in Action Space, herein Jamal).
Regarding claim 6,
	Rozantsev teaches the apparatus of Claim 5, wherein 
	Thus far, Rozantsev does not explicitly teach the first and second neural networks pertain to action recognition, and wherein the first training data is related to the second training data in that the first and second training data both pertain to a same action. 
	Jamal teaches the first and second neural networks pertain to action recognition, and wherein the first training data is related to the second training data in that the first and second training data both pertain to a same action. (Jamal, page 1, paragraph 1, “In this paper, we investigate the problem of Domain Shift in action videos, an area that has remained under-explored, and propose two new approaches named Action Modeling on Latent Subspace (AMLS) and Deep Adversarial Action Adaptation (DAAA). In the AMLS approach, the action videos in the target domain are modeled as a sequence of points on a latent subspace and adaptive kernels are successively learned between the source domain point and the sequence of target domain points on the manifold… The action adaptation experiments were conducted using various combinations of multi-domain action datasets, including six common classes of OLYMPIC Sports and UCF50 datasets and all classes of KTH, MSR and our own SonyCam datasets. In other words, action adaptation is pertain to action, and domain adaptation shift, and source domain and target domain is the first and second training data both pertain to the same action.)
Regarding claim 15,
	Rozantsev teaches the apparatus of Claim 14, wherein
	Thus far, Rozantsev does not explicitly teach the first domain comprises real world video data and the second domain comprises computer game video data.  
	Jamal teaches the first domain comprises real world video data and the second domain comprises computer game video data (Jamal, page 3, paragraph 6, line 1 “All the DA techniques found in the literature address the image/object classification problem.  In fact, we could hardly find any work on the video-to-video domain adaptation problem.  There are a few studies [6, 19, 31] on cross-view action recognition and a few on heterogeneous domain adaptation [8,9].  In that sense, to the best of our knowledge, this paper is one of the first few papers for the video-video domain adaptation.” And, page 1, paragraph 1, line 14 “The action adaptation experiments were conducted using various combinations of multi-domain action datasets, including six common classes of Olympic Sorts and UCF50 datasets and all classes of KTH, MSR and our own SonyCam datasets.”  Examiner notes that the MSR datasets include skeleton data in screen coordinates (MSRAction3DSkeleton (20joints).rar), (e.g. https://wangjiangb.github.io/my_data.html, page is also included in 892) which is virtual and therefore equivalent to a computer game video data set. In other words, video-video is the first domain comprises real world video data and the second domain comprises computer game video data.)
Both Rozantsev and Jamal are directed to deep domain adaptation, among other things.  Rozantsev teaches domain adaptation for images but does not specifically teach domain adaptation for detecting action in videos.  Jamal teaches domain adaptation for detecting action in videos.  In view of the teaching of Rozantsev it would be obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to combine the teaching of Jamal into Rozantsev.  This would result in being able to perform transfer learning between two domains of videos for action detection.
One of ordinary skill in the art would be motivated to do this because it is important for the safety of the public to be able to understand what is happening in videos. (Jamal, page 1, paragraph 2, line 1 “Today, surveillance cameras are everywhere, be it city streets, market place, buildings or airports.  These cameras operate 24x7, generating a massive amount of video data that needs to be processed for autonomous understanding of events and activities occurring in the scene.”)
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Rozantsev, and Deng et al (Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition, herein Deng).

Regarding claim 16,
Rozantsev teaches the apparatus of Claim 14, wherein
	Thus far, Rozantsev does not explicitly teach the first domain comprises information pertaining to a first voice and the second domain comprises information pertaining to a second voice.
	Deng teaches the first domain comprises information pertaining to a first voice and the second domain comprises information pertaining to a second voice. (Deng, Algorithm 1, and page 511, column 1, paragraph 1, line 6 “In this context, this paper presents a sparse autoencoder method for feature transfer learning for speech emotion recognition.” And, page 514, column 2, paragraph 1, line 2 “Then, we move to transferring information from the source to the target domain.”  And, page 514, column 2, paragraph 3, line 1 “As stated, we treat FAU AEC as target set, which consists of a training and test partition (roughly half and half) naturally given by recordings at different elementary schools.”

    PNG
    media_image9.png
    458
    593
    media_image9.png
    Greyscale

 In other words, from Algorithm 1, Ts and Tt are a first domain comprises information pertaining to a first voice and the second domain comprises information pertaining to a second voice.)
Both Rozantsev and Deng are directed to domain adaptation, among other things.  Rozantsev teaches domain adaptation for images but does not specifically teach domain adaptation for voices.  Deng teaches domain adaptation for voices.  In view of the teaching of Rozantsev it would be obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to combine the teaching of Deng into Rozantsev.  This would result in being able to transfer learning between two voice domains for detecting emotion.
One of ordinary skill in the art would be motivated to do this in order to reduce the level of human effort required to label captured data. (Deng, page 1, paragraph 2, line 15 “When labelling emotional corpora, even worse, there is no certain ground truth but a subjective ambiguous ‘gold standard’ as given by majority voting of several human raters which may be in considerable disagreement.  To reduce human label effort, either to annotate new data or bridge the gap between corpora annotated in different ways, speech emotion recognition is in need of a method to reuse existing corpora and retrieve useful information within corpora for a related target task.”)
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Rozantsev, and Xu et al (A Unified Framework for Metric Transfer Learning, herein Xu).
28.	Regarding claim 17,
	Rozantsev teaches the apparatus of Claim 14, wherein
	Thus far, Rozantsev does not explicitly teach the first domain pertains to standard font text and the second domain pertains to cursive script.  
	Xu teaches the first domain pertains to standard font text and the second domain pertains to cursive script. (Xu, page 1164, column 2, paragraph 1, line 1 “The USPS dataset and the MNIST dataset are widely used in computer vision and pattern recognition. The USPS dataset consists of 9,298 labeled images, each of which is of the size of 16 X 16.  The MNIST dataset consist of 60,000 training images and 10,000 test images, each of which is of the size of 28 X 28.  Note that, the USPS and MNIST datasets are subject to different distributions and both contain 10 categories.  We construct two handwriting recognition tasks usps-mnist and mnist-usps based on the two datasets.  For example, the task usps-mnist denotes that USPS is used as the source domain and MNIST is used as the target domain.” In other words, USPS is handwritten text dataset and MNIST is cursive dataset.)
Both Rozantsev and Xu are directed to domain adaptation, among other things.  Rozantsev teaches domain adaptation for images but does not specifically teach domain adaptation for text and cursive script.  Xu teaches domain adaptation for text and cursive script.  In view of the teaching of Rozantsev it would be obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to combine the teaching of Xu into Rozantsev.  This would result in being able to transfer learning between domains of images as well as domains of text and cursive script.
	One of ordinary skill in the art would be motivated to do this in order to make transfer learning more effective for application to real-world problems such as domain adaptation between text and cursive script. (Xu, page 1, paragraph 1, line 8 “Unlike previous work where instance weights and Mahalanobis distance are trained in a pipelined framework that potentially leads to error propagation across different components, MTLF attempts to learn instance weights and a Mahalanobis distance in a parallel framework to make knowledge transfer across domains more effective.” And, page 1, column 1, paragraph 2,”In practice, transfer learning is desirable to many real-world applications.”)
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124