Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 1/28/21 and 1/19/22 are  being considered by the examiner.

Priority
The claim limitations “self-supervised learning”, “an attribute dictionary of abstract attributes”, “zero-shot attribution distillation” has no corresponding prior date as provisional patent application No 62/752,166 and US Patent Application No. 16/532,321. The prior date of these claim limitations should be Dec. 10, 2019.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Independent claims 1, 6 and 11 cite “optimal pseudo-task”. Applicant fails to provide a clear definition of optimal pseudo-task. For purpose of the examination, the limitation “optimal pseudo-task” is interpreted as “pseudo-task” with any other algorithm.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-17 are rejected under 35 U.S.C. 103 as being unpatentable over Computer Vision-ECCV 2008 (10th European Conference on Computer Vision Marseille, France, Oct. 2008, Proceedings, Part III, LNCS5304, Hereinafter ECCV) in view of Goodfellow et al. (Deep learning, ISBN 9780262035613, published in 2016),  Ye et al. (Self-Training Ensemble Networks for Zero-Shot Image Recognition, cited in IDS) and Baheti et al. (US 2019/0101634). 
As to Claim 1, ECCV teaches A system for learning object labels for control of an autonomous platform, the system comprising: 
one or more processors and a non-transitory computer-readable medium having executable instructions encoded thereon such that when executed, the one or more processors perform operations of: 
performing pseudo-task optimization to identify an optimal pseudo-task for each source model of one or more source models (ECCV,

    PNG
    media_image1.png
    351
    835
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    64
    839
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    441
    844
    media_image3.png
    Greyscale
);
training an initial target network with self-supervised learning using the optimal pseudo-task (ECCV,

    PNG
    media_image4.png
    254
    928
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    341
    732
    media_image5.png
    Greyscale
); 
extracting a plurality of source image components from the one or more source models; generating an attribute dictionary of abstract attributes from the plurality of source image components (ECCV,

    PNG
    media_image6.png
    661
    850
    media_image6.png
    Greyscale

Here, ECCV teaches the extracted features by feature extractors within CNN, see also Fig 5. ECCV doesn’t directly use claim language “abstract attributes”. Goodfellow can give a good explanation of abstract attributes at p. 6 as shown below:

    PNG
    media_image7.png
    1015
    850
    media_image7.png
    Greyscale

  );
aligning a set of unlabeled target data with the one or more source models that are similar to the set of unlabeled target data (ECCV,

    PNG
    media_image8.png
    540
    923
    media_image8.png
    Greyscale
);
mapping the set of unlabeled target data onto a plurality of abstract attributes in the attribute dictionary (ECCV discloses “When the input space X represents images, the inclusion of related tasks would help induce similarity measures between images that enhances the generalization of the main task being learned. The nature of this similarity measure depends on the architecture of the learning system. For instance, in a feed-forward Neural Network (NN) with one hidden layer, all tasks would share the same hidden representation (feature space) Φ(x) (see Fig. 1-b) and thus the inclusion of pseudo tasks in this architecture would hopefully result in constraining the model to map semantically similar points like a and b ,from the input space, to nearby positions in the feature space” at p. 72. Here, the feature space refers to abstract attributes dictionary, see also p. 7 of Goodfellow); 
generating a new target network from the mapping (ECCV discloses “It is interesting to contrast our approach with the layer-wise training one in [20]. In [20], each feature extraction layer is trained to model its input in a layer-wise fashion: the first layer is trained on the raw images and then used to produce the input to the second feature extraction layer. The whole resulting architecture is then used as a multilayered feature extractor over labeled data, and the resulting representation is then used to feed an SVM classifier. On contrast, in our approach, we jointly train the classifier and the feature extraction layers, thus the feature extraction layer training is guided by the pseudo-tasks as well as the labeled information simultaneously” at p. 75);
using the new target network, assigning an object label to an object in the unlabeled target data (ECCV discloses “In this figure it is clear that input points a and b1 have similar values across all of these tasks, and thus one can conclude that these two input points are semantically similar, and therefore should be assigned similar values under other related tasks” under section 3.1 Basic; “We use a set of pseudo tasks to incorporate prior knowledge into the training of recognition models. Therefore, these tasks need to be 1) automatically computable based on unlabeled images, and 2) relevant to the specific recognition task at hand, in other words, it is highly likely that two semantically similar images would be assigned similar outputs under a pseudo task” under section 5 Generating Pseudo Tasks).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of ECCV with the teaching 
of Goodfellow so as to explain a deep learning model that may include a series of hidden layers extracting increasingly abstract features from the input images (Goodfellow, p. 6).
ECCV teaches alignment between the sources with the target by learning a shared latent space shared by source and target models that predict the attributes for the source and target data. ECCV doesn’t directly use the claim language zero-shot attribute distillation. The combination of Ye further teaches zero-shot learning. For example, Ye discloses “Zero-shot learning (ZSL) aims to transfer knowledge from labeled classes into unlabeled classes to reduce human labeling effort… A self-training framework is then deployed to iteratively label the most confident images in each unlabeled class with predicted pseudo-labels and update the ensemble network with the training data augmented by the pseudo-labels” in Abstract; “As a special unsupervised domain adaptation, ZSL aims to transfer information from the source domain, a set of training classes with labeled data, to make predictions in the target domain, a set of test classes with only unlabeled data… Existing zero-shot image recognitions have centered on deploying label embeddings in a common semantic space, e.g., in terms of high level visual attributes, to bridge the domain gap between seen and unseen classes” at p 1; see also zero-shot classification function at p. 4.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of ECCV, Goodfellow with the teaching Ye so as to explain a zero-shot learning for learning a source semantic dictionary on seen classes and then using it to regularize the learning of target semantic dictionary on unseen instances (Ye, p. 3).
ECCV, Goodfellow and Ye don’t explicitly teach controlling the autonomous platform. The combination of Baheti further teaches following limitation:
controlling the autonomous platform based on the assigned object label (Baheti discloses “In some embodiments, in response to a determination that an object is approaching the vehicle 200, the processing circuitry 134 may classify or assign a label to the object (e.g. as a pedestrian, bicycle, motorcycle, or car). Such determinations may be made by signal processing steps, including target classification, machine learning” in [0047]; “Consequently, in response to the final label of the object being a motorcycle or an automobile, the processing circuitry 134 may provide a control signal to the controller 216 that triggers the controller 216 to deploy the vehicle's airbags and/or alert an emergency response team. (in step 422) In some embodiments, the control signal may further trigger the controller 216 to turn off the engine of the vehicle 200” in [0069]. Here, the vehicle refers to autonomous platform.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of ECCV, Goodfellow, Ye with the teaching Baheti so as to control a vehicle in response to the final label of the object (Baheti, [0069]).

As to Claim 2, ECCV in view of Goodfellow, Ye and Baheti teaches The system as set forth in Claim 1, wherein the set of unlabeled target data is an input image, and wherein mapping the unlabeled target data onto abstract attributes further comprises:
dissecting the input image into a plurality of target image components (ECCV discloses feature extractors to extract features from the target images at p. 72; feature extraction layer at p. 75. Goodfellow also discloses “See Collobert (2011) for an example of deep learning applied to a parsing task. Another example is pixel-wise segmentation of images, where the computer program assigns every pixel in an image to a specific category” at p. 98);
comparing the plurality of target image components with the plurality of source image components (ECCV,

    PNG
    media_image8.png
    540
    923
    media_image8.png
    Greyscale
);
assigning the object label to the object based on the comparison (ECCV discloses “We use a set of pseudo tasks to incorporate prior knowledge into the training of recognition models. Therefore, these tasks need to be 1) automatically computable based on unlabeled images, and 2) relevant to the specific recognition task at hand, in other words, it is highly likely that two semantically similar images would be assigned similar outputs under a pseudo task” at p. 75. See also Ye’s section 3.1.1 & 3.1.3 as below:

    PNG
    media_image9.png
    227
    1402
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    189
    1395
    media_image10.png
    Greyscale
);
generating an executable control script appropriate for the object label; and causing the autonomous platform to execute the control script and perform an action corresponding to the control script (Baheti discloses “Consequently, in response to the final label of the object being a motorcycle or an automobile, the processing circuitry 134 may provide a control signal to the controller 216 that triggers the controller 216 to deploy the vehicle's airbags and/or alert an emergency response team. (in step 422) In some embodiments, the control signal may further trigger the controller 216 to turn off the engine of the vehicle 200” in [0069].)

As to Claim 3, ECCV in view of Goodfellow, Ye and Baheti teaches The system as set forth in Claim 1, wherein a source similarity graph is used to select the one or more source models that are similar to the set of unlabeled target data and performing pseudo-task optimization further comprises:
computing a similarity measure between the one or more source models
(ECCV discloses “Based on these feature descriptors, a similarity measure is induced over images… This similarity measure is then then used to train a discriminative classifier” at p. 70; “Recently [20] proposed a layer-wise greedy algorithm that utilizes unlabeled data for pre-training CNNs. More recently, in [13], the authors proposed to train a feed-forward model jointly with an unsupervised embedding task, which also leads to improved results” at p. 71; “When the input space X represents images, the inclusion of related tasks would help induce similarity measures between images that enhances the generalization of the main task being learned” at p. 72. Ye further discloses “the similarity score between the seen classes, S, and a randomly selected subset of the unseen classes” at p. 4; “We first make predictions using each of the K classifiers based on similarity scores” at p. 5);
generating the source similarity graph based on the similarity measure; and using the source similarity graph, identifying one or more source models that are  similar to the set of unlabeled target data (ECCV, p. 670,

    PNG
    media_image11.png
    477
    928
    media_image11.png
    Greyscale
,
see also Fig 4; “the only way that the NN can satisfy these requirements, is to map points like a and b to nearby position in the feature space” at p. 73. See also Ye’s section 3.1.1 & 3.1.3 as below:

    PNG
    media_image9.png
    227
    1402
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    189
    1395
    media_image10.png
    Greyscale
).

As to Claim 4, ECCV in view of Goodfellow, Ye and Baheti teaches The system as set forth in Claim 1, wherein extracting the plurality of source image components and generating the attribute dictionary further comprises:
generating the plurality of source image components for each source model using unsupervised data decomposition; mapping the plurality of source image components and their corresponding labels onto the plurality of abstract attributes, resulting in clusters of abstract attributes; and generating the attribute dictionary from the clusters of abstract attributes (ECCV discloses “More recently, in [13], the authors proposed to train a feed-forward model jointly with an unsupervised embedding task, which also leads to improved results. Though using unlabeled data too…In the setting considered in this paper, all tasks share the same input space (X) and each task m can be viewed as a function fm that maps between this space to an output space: fm : X → Y . Intuitively, if the tasks are truly related, then there is a shared structure between all of all them that can be leveraged by learning them in parallel” at p. 71;

    PNG
    media_image12.png
    858
    846
    media_image12.png
    Greyscale

Goodfellow further discloses 

    PNG
    media_image7.png
    1015
    850
    media_image7.png
    Greyscale
; “Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset. In the context of deep learning, we usually want to learn the entire probability distribution that generated a dataset, whether explicitly, as in density estimation, or implicitly, for tasks like synthesis or denoising. Some other unsupervised learning algorithms perform other roles, like clustering, which consists of dividing the dataset into clusters of similar examples” at p. 102.)

As to Claim 5, ECCV in view of Goodfellow, Ye and Baheti teaches The system as set forth in Claim 1, wherein the autonomous platform is a vehicle, and wherein the one or more processors further perform an operation of causing the vehicle to perform a driving operation in accordance with the assigned object label (Baheti discloses  “Consequently, in response to the final label of the object being a motorcycle or an automobile, the processing circuitry 134 may provide a control signal to the controller 216 that triggers the controller 216 to deploy the vehicle's airbags and/or alert an emergency response team. (in step 422) In some embodiments, the control signal may further trigger the controller 216 to turn off the engine of the vehicle 200” in [0069]. Here, the vehicle refers to autonomous platform.)

Claim 6 recites similar limitations as claim 1 but in a method form. Therefore, the same rationale used for claim 1 is applied.
Claim 7 is rejected based upon similar rationale as Claim 2.
Claim 8 is rejected based upon similar rationale as Claim 3.
Claim 9 is rejected based upon similar rationale as Claim 4.
Claim 10 is rejected based upon similar rationale as Claim 5.
Claim 11 recites similar limitations as claim 1 but in a computer readable medium form. Therefore, the same rationale used for claim 1 is applied.
Claim 12 is rejected based upon similar rationale as Claim 2.
Claim 13 is rejected based upon similar rationale as Claim 3.
Claim 14 is rejected based upon similar rationale as Claim 4.
Claim 15 is rejected based upon similar rationale as Claim 5.
Claim 16 is rejected based upon similar rationale as Claim 1 & 3.

As to Claim 17, ECCV in view of Goodfellow, Ye and Baheti teaches The method as set forth in Claim 16, further comprising an acts of:
collecting data from the new target network; and propagating object labels in a latent feature space, resulting in an improved dictionary of abstract attributes and a refined target network (ECCV discloses “When the input space X represents images, the inclusion of related tasks would help induce similarity measures between images that enhances the generalization of the main task being learned. The nature of this similarity measure depends on the architecture of the learning system. For instance, in a feed-forward Neural Network (NN) with one hidden layer, all tasks would share the same hidden representation (feature space) Φ(x) (see Fig. 1-b) and thus the inclusion of pseudo tasks in this architecture would hopefully result in constraining the model to map semantically similar points like a and b ,from the input space, to nearby positions in the feature space… As depicted in Fig. 1.b, all the tasks share the hidden layer feature mapping” at p. 72. Here, the shared hidden feature space refers to a latent feature space. Goodfellow further explains that “The input is presented at the visible layer, so named because it contains the variables that we are able to observe. Then a series of hidden layers extracts increasingly abstract features from the image. These layers are called “hidden” because their values are not given in the data; instead the model must determine which concepts are useful for explaining the relationships in the observed data. The images here are visualizations of the kind of feature represented by each hidden unit” at p. 6, see also Fig 1.2 and p. 573.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221.  The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/WEIMING HE/
Primary Examiner, Art Unit 2612