DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in the HELLENIC REPUBLIC on 1/27/2020. It is noted, however, that applicant has not filed a certified copy of the GR20200100034 application as required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/17/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, Jiaqi, et al. "Snr: Sub-network routing for flexible parameter sharing in multi-task learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019 (“Ma”) in view of Misra, Ishan, et al. "Cross-stitch networks for multi-task learning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016 (“Misra”) and in view of Jang, Eric, et al. "Categorical reparameterization with gumbel-softmax." arXiv preprint arXiv:1611.01144 (2016)(“Jang”).
Regarding claim 1 Ma teaches a computer-implemented method for training a machine-learned model for flexible-multi-task learning, the machine-learned model configured to perform a plurality of tasks(Ma, pg. 5, “We use YouTube8M… as our benchmark dataset to evaluate the effectiveness of the proposed methods. This dataset consists of 6.1 million of YouTube videos, each with (multiple) labels from a vocabulary
of more than 3, 000 topical entities. The topical entities can be further grouped into 24 top-level topic categories. To create a multi-task learning problem from the dataset, we treat each top-level topic category as a separate prediction task, so that each task is a multi-label classification problem. To ensure data quantity per task, we used the top 16 categories in data volume.We use the training set provided in the original dataset as our training set, and split the original validation set into our own validation set and test set….”), the machine-learned model comprising a plurality of layers, each layer comprising a plurality of components(Ma, pgs. 2-6, Figure 1 is detailed below: 

    PNG
    media_image1.png
    479
    863
    media_image1.png
    Greyscale

It details a Sub-Network Routing with Transformation (SNR-Trans) model and a Sub-Network Routing with Average (SNR-Aver) model consisting of two tasks, Task A and Task B, in which the sub-networks below Tower A and Tower B are made up of a plurality of layers and linear transformation components.), each task assigned to select one or more components for each layer according to a connection probability matrix for the layer(Ma, pgs. 3-4, “Suppose there are two subsequent layers of sub-networks, and the lower-level layer has 3 sub-networks and the higher-level layer has 2 sub-networks. Let                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                     be the outputs of the lower-level sub-networks and let                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     be the inputs of the higher-level sub-networks. Then SNR-Trans can be formulated as                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a transformation matrix from the jth lower level sub-network to ith higher-level sub-network and z represents the coding variables (a group of binary variables controlling the connection). Similarly, SNR-Aver can be formulated as                          
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where                         
                            
                                
                                    I
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an identity matrix for all i,j… we propose to model the coding variables z as latent random variables from parameterized distributions, and learn the distribution parameters and model parameters simultaneously.” Ma teaches:                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                                              
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where z represents the coding variables a group of binary variables controlling the connection [i.e. each task assigned to select one or more components for each layer according to a connection probability matrix for the layer]), the method comprising: obtaining a test input; selecting a particular task from the one or more tasks(Ma, pgs. 5-6, see also fig. 5,  “We use YouTube8M… as our benchmark dataset to evaluate the effectiveness of the proposed methods. This dataset consists of 6.1 million of YouTube videos, each with (multiple) labels from a vocabulary of more than 3, 000 topical entities. The topical entities can be further grouped into 24 top-level topic categories. To create a multi-task learning problem from the dataset, we treat each top-level topic category as a separate prediction task, so that each task is a multi-label classification problem. To ensure data quantity per task, we used the top 16 categories in data volume [i.e. selecting a particular task from the one or more tasks]. We use the training set provided in the original dataset as our training set, and split the original validation set into our own validation set and test set [i.e. obtaining a test input ]”); and training the machine-learned model for the particular task, wherein training the machine- learned model for the particular task comprises: performing a forward pass using the test input and one or more connection probability matrices to generate a sample distribution of test outputs; training the components of the machine-learned model based at least in part on the sample distribution(Ma, pg. 5, “The whole model, including both model parameters W and latent variable distribution parameters log(                        
                            α
                        
                    ), is trained by stochastic gradient based optimization. For each mini-batch in the forward pass, we first sample a group of uniform random variables u, then calculate z to obtain the network architecture, and finally feed the input data into the model to compute the loss.”); and performing a backwards pass to train the connection probability matrix (Ma, pg. 5, “The gradients w.r.t. W and log(                        
                            α
                        
                    ) are calculated by back-propagation… [a]t serving time, the following estimator… is used for z,                         
                            
                                
                                    z
                                
                                ^
                            
                            =
                            
                                
                                    min
                                
                                ⁡
                                
                                    
                                        
                                            1
                                            ,
                                            
                                                
                                                    max
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            0
                                                            ,
                                                             
                                                            s
                                                            i
                                                            g
                                                            m
                                                            o
                                                            i
                                                            d
                                                            
                                                                
                                                                    
                                                                        
                                                                            log
                                                                        
                                                                        ⁡
                                                                        
                                                                            
                                                                                
                                                                                    α
                                                                                
                                                                            
                                                                        
                                                                    
                                                                
                                                            
                                                            
                                                                
                                                                    ζ
                                                                    -
                                                                    γ
                                                                
                                                            
                                                            +
                                                            γ
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            .
                        
                    ”).  
	Ma does not teach: wherein the connection probability matrix comprises a matrix indicative of a probability of a particular component being activated such that an input into the machine-learned model is routed through the activated components to generate an output. 
	However, Misra teaches: wherein the connection probability matrix comprises a matrix indicative of a probability of a particular component being activated such that an input into the machine-learned model is routed through the activated components to generate an output(Misra, pg. 3996, see also figs. 3 and 4, “At each layer of the network, we model sharing of representations by learning a linear combination of the activation maps…using a cross-stitch
unit. Given two activation maps                         
                            
                                
                                    x
                                
                                
                                    A
                                
                            
                            ,
                             
                            
                                
                                    x
                                
                                
                                    B
                                
                            
                        
                     from layer l for both the tasks, we learn linear combinations                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            A
                                        
                                    
                                
                                ~
                            
                        
                    ,                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            B
                                        
                                    
                                
                                ~
                            
                        
                    …of both the input activations and feed these combinations as input to the next layers’ filters. This linear combination is parameterized using α. Specifically, at location
(i, j) in the activation map,                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                x
                                                            
                                                            
                                                                A
                                                            
                                                            
                                                                i
                                                                j
                                                            
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                
                                                                    
                                                                        x
                                                                    
                                                                    
                                                                        B
                                                                    
                                                                    
                                                                        i
                                                                        j
                                                                    
                                                                
                                                            
                                                            ~
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        B
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        B
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        A
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        B
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    .” Misra teaches:                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        B
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        B
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    [i.e. a matrix indicative of a probability of a particular component being activated ]                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        A
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        B
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     [i.e. such that an input into the machine-learned model ]                        
                             
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                x
                                                            
                                                            
                                                                A
                                                            
                                                            
                                                                i
                                                                j
                                                            
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                
                                                                    
                                                                        x
                                                                    
                                                                    
                                                                        B
                                                                    
                                                                    
                                                                        i
                                                                        j
                                                                    
                                                                
                                                            
                                                            ~
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                    
                                
                            
                        
                     [i.e. is routed through the activated components to generate an output ]).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the teachings of Misra  the motivation to do so would be to incorporate cross-stich units to do parameter sharing multi-task networks(Misra, pg. 3995, “This paper proposes cross-stitch units, using which a single network can capture all these Split-architectures (and more). It automatically learns an optimal combination of shared and task-specific representations. We demonstrate that such a cross-stitched network can achieve better performance than the networks found by brute-force enumeration and search.”). 
Ma does not teach: of the machine-learned model using an approximation. 
However, Jang teaches: of the machine-learned model using an approximation(Jang, pg. 3, “For scenarios in which we are constrained to sampling discrete values… we discretize y using arg max but use our continuous approximation in the backward pass by approximating                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            z
                            ≈
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            y
                        
                     [i.e. of the machine-learned model using an approximation]. We call this the Straight-Through (ST) Gumbel Estimator… ST Gumbel-Softmax allows samples to be sparse even when the temperature                         
                            τ
                        
                     is high.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the teachings of Jang the motivation to do so would be to make categorical distributions differentiable to do stochastic gradient descent and train neural networks with backpropagation(Jang, pg. 1, “[S]tochastic networks with discrete variables are difficult to train because the backpropagation algorithm - while permitting efficient computation of parameter gradients - cannot be
applied to non-differentiable layers… [t]he practical outcome of this paper is a simple, differentiable approximate sampling mechanism for categorical variables that can be integrated into neural networks and trained using standard backpropagation.”).  
Regarding claim 2, Ma in view of  Misra and in view of Jang teaches the computer-implemented method of claim 1, wherein the approximation comprises a straight-through Gumbel-Softmax approximation(Jang, pg. 3, “For scenarios in which we are constrained to sampling discrete values… we discretize y using arg max but use our continuous approximation in the backward pass by approximating                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            z
                            ≈
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            y
                        
                    . We call this the Straight-Through (ST) Gumbel Estimator… ST Gumbel-Softmax allows samples to be sparse even when the temperature                         
                            τ
                        
                     is high.”). 
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of  Jang for the same rationale stated at Claim 1.
Regarding claim 3, Ma in view of Misra and in view of Jang teaches  the computer-implemented method of claim 2, wherein each connection probability in each connection probability matrix comprises two complementary logits(Jang, pg. 2, “The Gumbel(0, 1) distribution can be sampled using inverse transform sampling by drawing u                        
                            ~
                        
                     Uniform(0, 1) and computing g = - log(- log(u)).”). 
  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of  Jang for the same rationale stated at Claim 1.
Regarding claim 4, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 3, wherein performing the backwards pass to train the connection probability matrix of the machine-learned model using the straight-through Gumbel-Softmax approximation comprises reparamaterizing the sample distribution from a Bernoulli distribution to a Gumbel distribution(Jang, pg. 2, “Let z be a categorical variable with class probabilities                         
                            
                                
                                    π
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    π
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            
                                
                                    π
                                
                                
                                    k
                                
                            
                        
                    … the Gumbel-Max trick… provides a simple and efficient way
to draw samples z from a categorical distribution with class probabilities                         
                            π
                        
                    :                         
                            z
                            =
                            o
                            n
                            e
                            _
                            h
                            o
                            t
                            (
                            
                                
                                    arg
                                
                                ⁡
                                
                                    
                                        
                                            
                                                
                                                    max
                                                
                                                
                                                    i
                                                
                                            
                                        
                                        ⁡
                                        
                                            [
                                            
                                                
                                                    g
                                                
                                                
                                                    i
                                                
                                            
                                            +
                                            l
                                            o
                                            g
                                            
                                                
                                                    π
                                                
                                                
                                                    i
                                                
                                            
                                            ]
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            
                                
                                    g
                                
                                
                                    1
                                
                            
                            …
                            
                                
                                    g
                                
                                
                                    k
                                
                            
                        
                     are i.i.d samples drawn from Gumbel(0,1).” ).1
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of  Jang for the same rationale stated at Claim 1.
Regarding claim 5, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 4, wherein reparamaterizing the sample distribution from a Bernoulli distribution to a Gumbel distribution comprises adding independent noise from the Gumbel distribution to each of the logits and selecting the binary value with the highest logit as the sample distribution(Jang, pg. 2, “Let z be a categorical variable with class probabilities                         
                            
                                
                                    π
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    π
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            
                                
                                    π
                                
                                
                                    k
                                
                            
                        
                    … the Gumbel-Max trick…provides a simple and efficient way to draw samples z from a categorical distribution with class probabilities                         
                            π
                        
                    :                         
                            z
                            =
                            o
                            n
                            e
                            _
                            h
                            o
                            t
                            (
                            
                                
                                    arg
                                
                                ⁡
                                
                                    
                                        
                                            
                                                
                                                    max
                                                
                                                
                                                    i
                                                
                                            
                                        
                                        ⁡
                                        
                                            [
                                            
                                                
                                                    g
                                                
                                                
                                                    i
                                                
                                            
                                            +
                                            l
                                            o
                                            g
                                            
                                                
                                                    π
                                                
                                                
                                                    i
                                                
                                            
                                            ]
                                        
                                    
                                
                            
                            )
                        
                     where                         
                            
                                
                                    g
                                
                                
                                    1
                                
                            
                            …
                            
                                
                                    g
                                
                                
                                    k
                                
                            
                        
                     are i.i.d samples drawn from Gumbel(0,1)[i.e. adding independent noise from the Gumbel distribution to each of the logits and selecting the binary value with the highest logit as the sample distribution].”).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of  Jang for the same rationale stated at Claim 1.
Regarding claim 6, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 2, wherein performing the forward pass using the test input and the one or more connection probability matrices to generate the sample distribution of test outputs comprises: for each of a plurality of different routing matrices, each routing matrix comprising a binary matrix indicative of which components in a particular layer are activated(Misra, pg. 3996, see also figs. 3 and 4, “Given two activation maps                         
                            
                                
                                    x
                                
                                
                                    A
                                
                            
                            ,
                             
                            
                                
                                    x
                                
                                
                                    B
                                
                            
                        
                     from layer l for both the tasks, we learn linear combinations                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            A
                                        
                                    
                                
                                ~
                            
                        
                    ,                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            B
                                        
                                    
                                
                                ~
                            
                        
                    …of both the input activations and feed these combinations as input to the next layers’ filters. This linear combination is parameterized using α. Specifically, at location (i, j) in the activation map,                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                x
                                                            
                                                            
                                                                A
                                                            
                                                            
                                                                i
                                                                j
                                                            
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                
                                                                    
                                                                        x
                                                                    
                                                                    
                                                                        B
                                                                    
                                                                    
                                                                        i
                                                                        j
                                                                    
                                                                
                                                            
                                                            ~
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        B
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        B
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        A
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        B
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     We refer to this the cross-stitch operation, and the unit that models it for each layer l as the cross-stitch unit. The network can decide to make certain layers task specific by setting                         
                            
                                
                                    α
                                
                                
                                    A
                                    B
                                
                            
                        
                     or                         
                            
                                
                                    α
                                
                                
                                    B
                                    A
                                
                            
                        
                     to zero, or choose a more shared representation by assigning a higher value to them.”), routing the test input through the activated components in each layer of the machine-learned model according to the respective routing matrix to generate a respective test output(Misra, pgs. 3997-3998, see also fig. 4 “For ablative analysis we consider the tasks of…Surface Normal Prediction (SN)…we use the standard train/test splits…[w]e combine two AlexNet architectures using the cross-stitch units as shown in Figure 4. We experimented with applying cross-stitch units after every convolution activation map and after every pooling activation map, and found the latter performed better. Thus, the cross-stitch units for AlexNet are applied on the activation maps for pool1, pool2, pool5, fc6 and fc7. We maintain one cross-stitch unit per ‘channel’ of the activation map, e.g., for pool1 we have 96 cross-stitch units.”); and sampling the plurality of test outputs according to the one or more connection probability matrices to generate the sample distribution of test outputs(Ma, pg. 5, “The whole model, including both model parameters W and latent variable distribution parameters log(                        
                            α
                        
                    ), is trained by stochastic gradient based optimization. For each mini-batch in the forward pass, we first sample a group of uniform random variables u, then calculate z to obtain the network architecture, and finally feed the input data into the model to compute the loss.”). 
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.
Regarding claim 7, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 6, wherein performing the forward pass using the test input and the one or more connection probability matrices to generate the sample distribution of test outputs further comprises: for a particular routing matrix: inputting the test input into one or more activated components of a first layer of the machine-learned model according to the particular routing matrix: receiving, as an output of the one or more activated components, a respective output(Misra, pgs. 3997-3998, see also fig. 4 “For ablative analysis we consider the tasks of…Surface Normal Prediction (SN)…we use the standard train/test splits…[w]e combine two AlexNet architectures using the cross-stitch units as shown in Figure 4. We experimented with applying cross-stitch units after every convolution activation map and after every pooling activation map, and found the latter performed better. Thus, the cross-stitch units for AlexNet are applied on the activation maps for pool1, pool2, pool5, fc6 and fc7. We maintain one cross-stitch unit per ‘channel’ of the activation map, e.g., for pool1 we have 96 cross-stitch units.”); and aggregating the respective outputs into an aggregated output(Ma, pg. 2, Figure 1 details the following: 

    PNG
    media_image2.png
    477
    437
    media_image2.png
    Greyscale

In the SNR-Aver Network “the shared layers are split into sub-networks and the connection ( dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”).  
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.

Regarding claim 8, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 7, wherein aggregating the respective outputs into an aggregated output comprises averaging the respective outputs(Ma, pg. 2, Figure 1 details the following: 

    PNG
    media_image2.png
    477
    437
    media_image2.png
    Greyscale

In the SNR-Aver Network “the shared layers are split into sub-networks and the connection ( dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”).  
Regarding claim 9, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 7, further comprising:  inputting the aggregated output(Ma pg. 2, As Figure 1 details, in the SNR-Aver Network “the shared layers are split into sub-networks and the connection (dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.” ) into one or more activated components of a second layer of the machine-learned model according to a second routing matrix associated with the second layer (Misra, pgs. 3997-3998, see also fig. 4 “We combine two AlexNet architectures using the cross-stitch units as shown in Figure 4. We experimented with applying cross-stitch units after every convolution activation map and after every pooling activation map, and found the latter performed better. Thus, the cross-stitch units for AlexNet are applied on the activation maps for pool1, pool2, pool5, fc6 and fc7. We maintain one cross-stitch unit per ‘channel’ of the activation map, e.g., for pool1 we have 96 cross-stitch units.”).  
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.
Regarding claim 10, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 7, further comprising: inputting the aggregated output into a task-specific head to generate a test output(Ma, pg. 2, Figure 1 details the following: 

    PNG
    media_image2.png
    477
    437
    media_image2.png
    Greyscale

In the SNR-Aver Network “the shared layers are split into sub-networks and the connection (dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”).  
Regarding claim 11, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 2, wherein training the components of the machine-learned model based at least in part on the sample distribution comprises training the components of the machine-learned model using a gradient descent(Ma, pg. 5, “The whole model, including both model parameters W and latent variable distribution parameters log(                        
                            α
                        
                    ), is trained by stochastic gradient based optimization. For each mini-batch in the forward pass, we first sample a group of uniform random variables u, then calculate z to obtain the network architecture, and finally feed the input data into the model to compute the loss.”).  
Regarding claim 12, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 2, further comprising: initializing each of the one or more connection probability matrices by selecting an initial value for each connection probability in each connection probability matrix(Misra, pg. 3998, see also table 1, “To ensure that values after the cross-stitch operation are of the same order of magnitude as the input values, an obvious initialization of the unit is that the α values form a convex linear combination, i.e., the different-task                         
                            
                                
                                    α
                                
                                
                                    D
                                
                            
                        
                     and the same-task                         
                            
                                
                                    α
                                
                                
                                    S
                                
                            
                        
                     to sum to one…[f]or this experiment, we initialize the networks A and B with one-task networks that were fine-tuned on the respective tasks. Table 1 shows the results of evaluating cross-stitch networks for different initializations of                         
                            α
                        
                     values.”). 
  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.
Regarding claim 13, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 12, wherein the initial value for each connection probability is approximately 0.5(Misra, pg. 3998, Table 1 details that one of the initial values for the connection probability for (                        
                            
                                
                                    α
                                
                                
                                    S
                                
                            
                        
                    ,                         
                            
                                
                                    α
                                
                                
                                    D
                                
                            
                        
                    ) is (0.5, 0.5):  

    PNG
    media_image3.png
    301
    690
    media_image3.png
    Greyscale

).  
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.
Regarding claim 14, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 12, wherein the initial value for each connection probability is selected to encourage or discourage a particular routing pathway(Misra, pg. 3998, Table 1 is detailed below:

    PNG
    media_image3.png
    301
    690
    media_image3.png
    Greyscale

By “[i]nitializing cross-stitch units with different α values, each corresponding to a convex combination. Higher values for                         
                            
                                
                                    α
                                
                                
                                    S
                                
                            
                        
                     indicate that we bias the cross-stitch unit to prefer task specific representations [and vice-versa]. The cross-stitched network is robust across different initializations of the units.” As table 1 shows the best performing initial values of the connection probabilities is (0.9, 0.1) for the tasks of Surface Normal Prediction and Semantic Segmentation which encourages task specific routing paths rather than shared routing paths.).  
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 1.
Regarding claim 16, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 1, wherein training the machine-learned model for the particular task is performed for a plurality of iterations(Ma, pg. 6, “All the models are trained using Adam… with learning rate as a tunable hyperparameter. The batch size is fixed as 128. Early stopping is used on the validation set [i.e. performed for a plurality of iterations].”), and wherein, upon completion of the plurality of iterations, the method further comprises:  selecting a maximum likelihood variant for each connection probability in the connection probability matrix associated with a particular layer as a corresponding binary value in a routing matrix to be used for inference(Ma, pg. 5, “The full objective function with L0 regularization is 

    PNG
    media_image4.png
    217
    598
    media_image4.png
    Greyscale
 [i.e.  selecting a maximum likelihood variant for each connection probability in the connection probability matrix associated with a particular layer] The whole model, including both model parameters W and latent variable distribution parameters log(                        
                            α
                        
                    ), is trained by stochastic gradient based optimization. For each mini-batch in the forward pass, we first sample a group of uniform random variables u, then calculate z to obtain the network architecture, and finally feed the input data into the model to compute the loss. The gradients w.r.t. W and log(                        
                            α
                        
                    ) are calculated by back-propagation[i.e. binary value in a routing matrix to be used for inference].”).  

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Ma, Jiaqi, et al. "Snr: Sub-network routing for flexible parameter sharing in multi-task learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019 (“Ma”) in view of Misra, Ishan, et al. "Cross-stitch networks for multi-task learning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016 (“Misra”) and in view of Jang, Eric, et al. "Categorical reparameterization with gumbel-softmax." arXiv preprint arXiv:1611.01144 (2016)(“Jang”) and further in view of Yadwadkar, Neeraja J., et al. "Multi-task learning for straggler avoiding predictive job scheduling." The Journal of Machine Learning Research 17.1 (2016): 3692-3728(“Yadwadkar”). 
Regarding claim 15, Ma in view of Misra and in view of Jang teaches the computer-implemented method of claim 2, but do not teach wherein training the machine- learned model for the particular task further comprises training the machine-learned model for the particular task using a budget penalty, wherein the budget penalty penalizes the machine- learned model for exceeding a given computational budget.
However, Vandenhende teaches: wherein training the machine- learned model for the particular task further comprises training the machine-learned model for the particular task using a budget penalty, wherein the budget penalty penalizes the machine- learned model for exceeding a given computational budget(Yadwadkar, pgs. 11-12, “Suppose there are T learning tasks, with the training set for the t-th learning-task denoted by                         
                            
                                
                                    D
                                
                                
                                    t
                                
                            
                            =
                            {
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            t
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            i
                                            t
                                        
                                    
                                
                            
                            :
                            i
                            =
                            1
                            ,
                            …
                            ,
                            
                                
                                    k
                                
                                
                                    t
                                
                            
                            }
                        
                    , with                         
                            
                                
                                    x
                                
                                
                                    i
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    d
                                
                            
                        
                    …[i]n our application, one may split learning-tasks into groups based on workload or based on nodes. We call one particular way of dividing learning-tasks into groups a partition. The p-th partition has                         
                            
                                
                                    G
                                
                                
                                    p
                                
                            
                        
                     groups, and the learning-task t belongs to the                         
                            
                                
                                    g
                                
                                
                                    p
                                
                            
                            (
                            t
                            )
                        
                     group under this partition. Now, we also have a separate set of weight vectors for each partition p, and the weight vector of the g-th group of the p-th partition is denoted by                         
                            
                                
                                    w
                                
                                
                                    p
                                    ,
                                    g
                                
                            
                        
                    [i.e. a given computational budget]” and the minimizing the training loss for the particular task is given by the following formula:  

    PNG
    media_image5.png
    155
    989
    media_image5.png
    Greyscale

Lowering                         
                            
                                
                                    λ
                                
                                
                                    p
                                
                            
                        
                     for a particular partition will reduce the penalty on the weights of the p-th partition and thus cause the model to rely more on the p-th partitioning. For the base partition p = 0, setting                         
                            
                                
                                    λ
                                
                                
                                    p
                                
                            
                        
                     = 0 would thus favor as much parameter sharing as feasible[i.e. using a budget penalty, wherein the budget penalty penalizes the machine- learned model for exceeding a given computational budget ]).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma in view of Misra and in view of Jang with the teachings of Yadwadkar the motivation to do so would be to do large-scale selective parameter sharing for multi-task networks under a  distributed computing environment(Yadwadkar, pg. 10, “Wrangler builds separate models for each workload
and for every node. Thus, every {node, workload} tuple is a separate learning problem.
However, learning problems corresponding to different workloads executed on the same node
clearly have something in common, as do learning tasks corresponding to different nodes
executing the same workload. We want to use this shared structure between the learning
problems to reduce data collection time… [w]e turn to multi-task learning to leverage this shared structure. In the terminology of multi-task learning, each {node, workload} pair forms a separate learning-task and these learning problems have a shared structure between them. However, unlike typical MTL formulations, our learning-tasks are not simply correlated with each other; they share a specific structure, clustering along node-or workload-dependent axes.”). 

Claims 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, Jiaqi, et al. "Snr: Sub-network routing for flexible parameter sharing in multi-task learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019 (“Ma”) in view of Yadwadkar, Neeraja J., et al. "Multi-task learning for straggler avoiding predictive job scheduling." The Journal of Machine Learning Research 17.1 (2016): 3692-3728(“Yadwadkar”) and in view of Jang, Eric, et al. "Categorical reparameterization with gumbel-softmax." arXiv preprint arXiv:1611.01144 (2016)(“Jang”). 
Regarding claim 17, Ma teaches a computing system, a multi-task machine-learned model configured to perform a plurality of tasks T, comprising: a plurality of layers L, each layer comprising a plurality of components C (Ma, pgs. 2-6, Figure 1 is detailed below: 

    PNG
    media_image1.png
    479
    863
    media_image1.png
    Greyscale

It details a Sub-Network Routing with Transformation (SNR-Trans) model and a Sub-Network Routing with Average (SNR-Aver) model consisting of two tasks, Task A and Task B, in which the sub-networks below Tower A and Tower B are made up of a plurality of layers and linear transformation components); a routing matrix of size TxC associated with each respective layer, the routing matrix for a particular layer comprising a matrix of binary allocation variables descriptive of which components in the respective layer an input into the machine-learned model is routed through to generate an output(Ma, pgs. 3-4, “Suppose there are two subsequent layers of sub-networks, and the lower-level layer has 3 sub-networks and the higher-level layer has 2 sub-networks. Let                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                     be the outputs of the lower-level sub-networks and let                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     be the inputs of the higher-level sub-networks. Then SNR-Trans can be formulated as                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a transformation matrix from the jth lower level sub-network to ith higher-level sub-network and z represents the coding variables (a group of binary variables controlling the connection). Similarly, SNR-Aver can be formulated as                          
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where                         
                            
                                
                                    I
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an identity matrix for all i,j… [f]or example, let's suppose                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     represent the outputs of sub-networks to two tasks and there is only one layer of hidden sub-networks                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                    . If we set all elements of z as 1, then the corresponding model degenerates to the classic shared-bottom model. If we set                         
                            
                                
                                    z
                                
                                
                                    11
                                
                            
                        
                     =                         
                            
                                
                                    z
                                
                                
                                    22
                                
                            
                        
                     = 1 and all other elements of z as 0, then the model degenerates to two small single-task models…we propose to model the coding variables z as latent random variables from parameterized distributions, and learn the distribution parameters and model parameters simultaneously.” Ma teaches:                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                                              
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where z represents the coding variables a group of binary variables controlling the connection [i.e. a routing matrix of size TxC associated with each respective layer, the routing matrix for a particular layer comprising a matrix of binary allocation variables descriptive of which components in the respective layer an input into the machine-learned model is routed through to generate an output] ); 
and a plurality of task-specific heads, each task-specific head configured to receive an output from a final layer of the one or more layers and generate an output associated with a respective task(Ma, pgs. 2-6, Figure 1 is detailed below: 

    PNG
    media_image1.png
    479
    863
    media_image1.png
    Greyscale

It details a Sub-Network Routing with Transformation (SNR-Trans) model and a Sub-Network Routing with Average (SNR-Aver) model consisting of two tasks, Task A and Task B); the operations comprising: obtaining an input; selecting a particular task; (Ma, pgs. 5-6, see also fig. 5,  “We use YouTube8M… as our benchmark dataset to evaluate the effectiveness of the proposed methods. This dataset consists of 6.1 million of YouTube videos, each with (multiple) labels from a vocabulary of more than 3, 000 topical entities. The topical entities can be further grouped into 24 top-level topic categories. To create a multi-task learning problem from the dataset, we treat each top-level topic category as a separate prediction task, so that each task is a multi-label classification problem. To ensure data quantity per task, we used the top 16 categories in data volume. We use the training set provided in the original dataset as our training set, and split the original validation set into our own validation set and test set.”); routing the input through the machine-learned model according to the respective routing matrix for each layer for the particular task(Ma, pgs. 3-4, “Suppose there are two subsequent layers of sub-networks, and the lower-level layer has 3 sub-networks and the higher-level layer has 2 sub-networks. Let                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                     be the outputs of the lower-level sub-networks and let                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     be the inputs of the higher-level sub-networks. Then SNR-Trans can be formulated as                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a transformation matrix from the jth lower level sub-network to ith higher-level sub-network and z represents the coding variables (a group of binary variables controlling the connection). Similarly, SNR-Aver can be formulated as                          
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where                         
                            
                                
                                    I
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an identity matrix for all i,j… [f]or example, let's suppose                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     represent the outputs of sub-networks to two tasks and there is only one layer of hidden sub-networks                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                    . If we set all elements of z as 1, then the corresponding model degenerates to the classic shared-bottom model. If we set                         
                            
                                
                                    z
                                
                                
                                    11
                                
                            
                        
                     =                         
                            
                                
                                    z
                                
                                
                                    22
                                
                            
                        
                     = 1 and all other elements of z as 0, then the model degenerates to two small single-task models [i.e. routing the input through the machine-learned model according to the respective routing matrix for each layer for the particular task]…we propose to model the coding variables z as latent random variables from parameterized distributions, and learn the distribution parameters and model parameters simultaneously.”); and receiving, as an output of the machine-learned model, a task-specific output from the task-specific head associated with the particular task(Ma, pgs. 2-6, Figure 1 is detailed below: 

    PNG
    media_image1.png
    479
    863
    media_image1.png
    Greyscale

It details a Sub-Network Routing with Transformation (SNR-Trans) model and a Sub-Network Routing with Average (SNR-Aver) model outputting two tasks, Task A and Task B given the input); wherein the multi-task machine-learned model has been trained to jointly learn the routing matrix with the plurality of components using back-propagation(Ma, pg. 5, “The gradients w.r.t. W and log(                        
                            α
                        
                    ) are calculated by back-propagation… [a]t serving time, the following estimator… is used for z,                         
                            
                                
                                    z
                                
                                ^
                            
                            =
                            
                                
                                    min
                                
                                ⁡
                                
                                    
                                        
                                            1
                                            ,
                                            
                                                
                                                    max
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            0
                                                            ,
                                                             
                                                            s
                                                            i
                                                            g
                                                            m
                                                            o
                                                            i
                                                            d
                                                            
                                                                
                                                                    
                                                                        
                                                                            log
                                                                        
                                                                        ⁡
                                                                        
                                                                            
                                                                                
                                                                                    α
                                                                                
                                                                            
                                                                        
                                                                    
                                                                
                                                            
                                                            
                                                                
                                                                    ζ
                                                                    -
                                                                    γ
                                                                
                                                            
                                                            +
                                                            γ
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            .
                        
                    ”).  
Ma does not teach: comprising: at least one processor; and at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the at least one processor to perform operations. 
However, Yadwadkar teaches: comprising: at least one processor; and at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the at least one processor to perform operations(Yadwadkar, pgs. 19-20, see also table 2, “The set of real-world workloads considered in this paper are collected from the production compute clusters at Facebook and Cloudera's customers, which we denote as FB2009, FB2010, CC_b and CC_e. Table 2 provides details about these workloads in terms of the number of machines in the actual clusters, the length and date of data capture, total number of jobs in those workloads… [t]ogether, the dataset consists of traces from over about 4000 machines captured over almost eight months. For faithfully replaying these real-world production traces on our 20 node EC2 cluster, we used a statistical workload replay tool, SWIM… that synthesizes a workload with representative job submission rates and patterns, shuffle/input data size and output/shuffle data ratios.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma in view of Yadwadkar the motivation to do so would be to optimize a Multi-Task Learning (MTL) distributed computing environment to avoid straggler tasks executing on a node that increase computational time (Yadwadkar, pgs. 4-6, “[A] tricky situation arises when a node is available, but is performing poorly. This causes the tasks scheduled on that node to execute slower than other tasks of the same job scheduled on other nodes in the cluster. Since a job finishes execution only when all its tasks have finished execution, such slow-running tasks, called stragglers, extend the job's completion time. This, in turn, leads to increased user costs… [w]e built Wrangler a system that learns to predict nodes that might create stragglers and uses these predictions as hints to the scheduler so as to avoid creating stragglers by rejecting bad placement decisions. Thus, being proactive, Wrangler is time efficient. Also, by smarter scheduling, we avoid replication of straggler tasks. Thus, Wrangler is also efficient in terms of reducing the resources consumed.”).
Ma does not teach: wherein the multi-task machine-learned model has been trained using a straight-through Gumbel-softmax approximation
However, Jang teaches: wherein the multi-task machine-learned model has been trained using a straight-through Gumbel-softmax approximation( Jang, pg. 3, “For scenarios in which we are constrained to sampling discrete values… we discretize y using arg max but use our continuous approximation in the backward pass by approximating                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            z
                            ≈
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            y
                        
                    . We call this the Straight-Through (ST) Gumbel Estimator… ST Gumbel-Softmax allows samples to be sparse even when the temperature                         
                            τ
                        
                     is high.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the teachings of Jang the motivation to do so would be to make categorical distributions differentiable to do stochastic gradient descent and train neural networks with backpropagation(Jang, pg. 1, “[S]tochastic networks with discrete variables are difficult to train because the backpropagation algorithm - while permitting efficient computation of parameter gradients - cannot be
applied to non-differentiable layers… [t]he practical outcome of this paper is a simple, differentiable approximate sampling mechanism for categorical variables that can be integrated into neural networks and trained using standard backpropagation.”).  
Regarding claim 20, Ma teaches the operations comprising: obtain a test input for a machine-learned model configured to perform a plurality of tasks(Ma, pgs. 5-6, see also fig. 5,  “We use YouTube8M… as our benchmark dataset to evaluate the effectiveness of the proposed methods. This dataset consists of 6.1 million of YouTube videos, each with (multiple) labels from a vocabulary of more than 3, 000 topical entities. The topical entities can be further grouped into 24 top-level topic categories. To create a multi-task learning problem from the dataset, we treat each top-level topic category as a separate prediction task, so that each task is a multi-label classification problem. To ensure data quantity per task, we used the top 16 categories in data volume. We use the training set provided in the original dataset as our training set, and split the original validation set into our own validation set and test set.”), the machine-learned model comprising a plurality of layers, each layer comprising a plurality of components(Ma, pgs. 2-6, Figure 1 is detailed below: 

    PNG
    media_image1.png
    479
    863
    media_image1.png
    Greyscale

It details a Sub-Network Routing with Transformation (SNR-Trans) model and a Sub-Network Routing with Average (SNR-Aver) model consisting of two tasks, Task A and Task B, in which the sub-networks below Tower A and Tower B are made up of a plurality of layers and linear transformation components.), each task assigned to select one or more components for each layer according to a connection probability matrix for each respective layer comprising a matrix of connection probabilities for each component to be used in the respective layer for the task(Ma, pgs. 3-4, “Suppose there are two subsequent layers of sub-networks, and the lower-level layer has 3 sub-networks and the higher-level layer has 2 sub-networks. Let                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                     be the outputs of the lower-level sub-networks and let                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     be the inputs of the higher-level sub-networks. Then SNR-Trans can be formulated as                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        W
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    , where                         
                            
                                
                                    W
                                
                                
                                    i
                                    j
                                
                            
                        
                     is a transformation matrix from the jth lower level sub-network to ith higher-level sub-network and z represents the coding variables (a group of binary variables controlling the connection). Similarly, SNR-Aver can be formulated as                          
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        v
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        11
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        11
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        12
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        12
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        13
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        13
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        21
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        21
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        22
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        22
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        z
                                                    
                                                    
                                                        23
                                                    
                                                
                                                
                                                    
                                                        I
                                                    
                                                    
                                                        23
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        1
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        2
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        u
                                                    
                                                    
                                                        3
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                     where                         
                            
                                
                                    I
                                
                                
                                    i
                                    j
                                
                            
                        
                     is an identity matrix for all i,j… [f]or example, let's suppose                         
                            
                                
                                    v
                                
                                
                                    1
                                
                            
                        
                    ,                         
                            
                                
                                    v
                                
                                
                                    2
                                
                            
                        
                     represent the outputs of sub-networks to two tasks and there is only one layer of hidden sub-networks                         
                            
                                
                                    u
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    2
                                
                            
                            ,
                             
                            
                                
                                    u
                                
                                
                                    3
                                
                            
                        
                    . If we set all elements of z as 1, then the corresponding model degenerates to the classic shared-bottom model. If we set                         
                            
                                
                                    z
                                
                                
                                    11
                                
                            
                        
                     =                         
                            
                                
                                    z
                                
                                
                                    22
                                
                            
                        
                     = 1 and all other elements of z as 0, then the model degenerates to two small single-task models…we propose to model the coding variables z as latent random variables from parameterized distributions, and learn the distribution parameters and model parameters simultaneously.”); selecting a particular task from the one or more tasks(Ma, pg. 7, Figure 5 details sub-network utilization by tasks: 

    PNG
    media_image6.png
    546
    676
    media_image6.png
    Greyscale

For example, for the task of identifying Arts_Enteriment YouTube videos the relative proportions of utilization is approximately 0.10); and training the machine-learned model for the particular task, wherein training the machine- learned model for the particular task comprises: performing a forward pass using the test input and the connection probability matrix for each layer to generate a sample distribution of test outputs(Ma, pg. 5, “The whole model, including both model parameters W and latent variable distribution parameters log(                        
                            α
                        
                    ), is trained by stochastic gradient based optimization. For each mini-batch in the forward pass, we first sample a group of uniform random variables u, then calculate z to obtain the network architecture, and finally feed the input data into the model to compute the loss.”); training the components of the machine-learned model based at least in part on the sample distribution; and performing a backwards pass to train the connection probability matrix of the machine-learned model(Ma, pg. 5, “The gradients w.r.t. W and log(                        
                            α
                        
                    ) are calculated by back-propagation… [a]t serving time, the following estimator… is used for z,                         
                            
                                
                                    z
                                
                                ^
                            
                            =
                            
                                
                                    min
                                
                                ⁡
                                
                                    
                                        
                                            1
                                            ,
                                            
                                                
                                                    max
                                                
                                                ⁡
                                                
                                                    
                                                        
                                                            0
                                                            ,
                                                             
                                                            s
                                                            i
                                                            g
                                                            m
                                                            o
                                                            i
                                                            d
                                                            
                                                                
                                                                    
                                                                        
                                                                            log
                                                                        
                                                                        ⁡
                                                                        
                                                                            
                                                                                
                                                                                    α
                                                                                
                                                                            
                                                                        
                                                                    
                                                                
                                                            
                                                            
                                                                
                                                                    ζ
                                                                    -
                                                                    γ
                                                                
                                                            
                                                            +
                                                            γ
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            .
                        
                    ”).  
Ma does not teach: One or more tangible, non-transitory computer-readable media that store instructions that, when executed by one or more processors, cause the one or more processors to perform operations.
However, Yadwadkar teaches: One or more tangible, non-transitory computer-readable media that store instructions that, when executed by one or more processors, cause the one or more processors to perform operations(Yadwadkar, pgs. 19-20, see also table 2, “The set of real-world workloads considered in this paper are collected from the production compute clusters at Facebook and Cloudera's customers, which we denote as FB2009, FB2010, CC_b and CC_e. Table 2 provides details about these workloads in terms of the number of machines in the actual clusters, the length and date of data capture, total number of jobs in those workloads… [t]ogether, the dataset consists of traces from over about 4000 machines captured over almost eight months. For faithfully replaying these real-world production traces on our 20 node EC2 cluster, we used a statistical workload replay tool, SWIM… that synthesizes a workload with representative job submission rates and patterns, shuffle/input data size and output/shuffle data ratios.”). 
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma in view of Yadwadkar the motivation to do so would be to optimize a Multi-Task Learning (MTL) distributed computing environment to avoid straggler tasks executing on a node that increase computational time (Yadwadkar, pgs. 4-6, “[A] tricky situation arises when a node is available, but is performing poorly. This causes the tasks scheduled on that node to execute slower than other tasks of the same job scheduled on other nodes in the cluster. Since a job finishes execution only when all its tasks have finished execution, such slow-running tasks, called stragglers, extend the job's completion time. This, in turn, leads to increased user costs… [w]e built Wrangler a system that learns to predict nodes that might create stragglers and uses these predictions as hints to the scheduler so as to avoid creating stragglers by rejecting bad placement decisions. Thus, being proactive, Wrangler is time efficient. Also, by smarter scheduling, we avoid replication of straggler tasks. Thus, Wrangler is also efficient in terms of reducing the resources consumed.”).
Ma does not teach: using a straight-through Gumbel-softmax approximation; each connection probability comprising two complementary logits. 
However, Jang teaches: using a straight-through Gumbel-softmax approximation(Jang, pg. 3, “For scenarios in which we are constrained to sampling discrete values… we discretize y using arg max but use our continuous approximation in the backward pass by approximating                         
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            z
                            ≈
                            
                                
                                    ∇
                                
                                
                                    θ
                                
                            
                            y
                        
                    . We call this the Straight-Through (ST) Gumbel Estimator… ST Gumbel-Softmax allows samples to be sparse even when the temperature                         
                            τ
                        
                     is high.”); each connection probability comprising two complementary logits(Jang, pg. 2, “The Gumbel(0, 1) distribution can be sampled using inverse transform sampling by drawing u                        
                            ~
                        
                     Uniform(0, 1) and computing g = - log(- log(u)).”).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the teachings of Jang the motivation to do so would be to make categorical distributions differentiable to do stochastic gradient descent and train neural networks with backpropagation(Jang, pg. 1, “[S]tochastic networks with discrete variables are difficult to train because the backpropagation algorithm - while permitting efficient computation of parameter gradients - cannot be
applied to non-differentiable layers… [t]he practical outcome of this paper is a simple, differentiable approximate sampling mechanism for categorical variables that can be integrated into neural networks and trained using standard backpropagation.”).  

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, Jiaqi, et al. "Snr: Sub-network routing for flexible parameter sharing in multi-task learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019 (“Ma”) in view of Yadwadkar, Neeraja J., et al. "Multi-task learning for straggler avoiding predictive job scheduling." The Journal of Machine Learning Research 17.1 (2016): 3692-3728(“Yadwadkar”) and in view of Jang, Eric, et al. "Categorical reparameterization with gumbel-softmax." arXiv preprint arXiv:1611.01144 (2016)(“Jang”) and further in view of  Misra, Ishan, et al. "Cross-stitch networks for multi-task learning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016 (“Misra”).
Regarding claim 18, Ma in view of Yadwadkar and in view of Jang teaches the computing system of claim 17, wherein routing the input through the machine-learned model according to the respective routing matrix for each layer for the particular task comprises: and aggregating the respective outputs into an aggregated output(Ma pg. 2, As Figure 1 details, in the SNR-Aver Network “the shared layers are split into sub-networks and the connection (dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”). 
Ma in view of Yadwadkar and in view of Jang does not teach: inputting the input into one or more activated components of a first layer of the machine- learned model according to a first routing matrix; receiving, as an output of the one or more activated components, a respective output. 
However, Misra teaches: inputting the input into one or more activated components of a first layer of the machine- learned model according to a first routing matrix; receiving, as an output of the one or more activated components, a respective output( Misra, pg. 3996, see also figs. 3 and 4, “At each layer of the network, we model sharing of representations by learning a linear combination of the activation maps…using a cross-stitch
unit. Given two activation maps                         
                            
                                
                                    x
                                
                                
                                    A
                                
                            
                            ,
                             
                            
                                
                                    x
                                
                                
                                    B
                                
                            
                        
                     from layer l for both the tasks, we learn linear combinations                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            A
                                        
                                    
                                
                                ~
                            
                        
                    ,                         
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            B
                                        
                                    
                                
                                ~
                            
                        
                    …of both the input activations and feed these combinations as input to the next layers’ filters. This linear combination is parameterized using α. Specifically, at location
(i, j) in the activation map,                         
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                x
                                                            
                                                            
                                                                A
                                                            
                                                            
                                                                i
                                                                j
                                                            
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        
                                                            
                                                                
                                                                    
                                                                        x
                                                                    
                                                                    
                                                                        B
                                                                    
                                                                    
                                                                        i
                                                                        j
                                                                    
                                                                
                                                            
                                                            ~
                                                        
                                                    
                                                    ~
                                                
                                            
                                        
                                    
                                
                            
                            =
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        A
                                                        B
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        A
                                                    
                                                
                                            
                                            
                                                
                                                    
                                                        α
                                                    
                                                    
                                                        B
                                                        B
                                                    
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        A
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                        
                                            
                                                
                                                    
                                                        x
                                                    
                                                    
                                                        B
                                                    
                                                    
                                                        i
                                                        j
                                                    
                                                
                                            
                                        
                                    
                                
                            
                        
                    .”).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma in view of Yadwadkar and in view of Jang with the teachings of Misra  the motivation to do so would be to incorporate cross-stich units to do parameter sharing multi-task networks(Misra, pg. 3995, “This paper proposes cross-stitch units, using which a single network can capture all these Split-architectures (and more). It automatically learns an optimal combination of shared and task-specific representations. We demonstrate that such a cross-stitched network can achieve better performance than the networks found by brute-force enumeration and search.”). 
Regarding claim 19, Ma in view of Yadwadkar and in view of Jang and further in view of Misra, teaches the computing system of claim 18, routing the input through the machine-learned model according to the respective routing matrix for each layer for the particular task further comprises: for each of one or more successive layers inclusive of the final layer: inputting the aggregated output of a previous layer(Ma, pg. 2, Figure 1 details the following: 

    PNG
    media_image2.png
    477
    437
    media_image2.png
    Greyscale

In the SNR-Aver Network “the shared layers are split into sub-networks and the connection (dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”) 
into one or more activated components of the successive layer of the machine-learned model according to a respective routing matrix for the particular task for each respective successive layer; receiving, as an output of the one or more activated components of the successive layer, a respective successive output(Misra, pgs. 3997-3998, see also fig. 4 “For ablative analysis we consider the tasks of…Surface Normal Prediction (SN)…we use the standard train/test splits…[w]e combine two AlexNet architectures using the cross-stitch units as shown in Figure 4. We experimented with applying cross-stitch units after every convolution activation map and after every pooling activation map, and found the latter performed better. Thus, the cross-stitch units for AlexNet are applied on the activation maps for pool1, pool2, pool5, fc6 and fc7. We maintain one cross-stitch unit per ‘channel’ of the activation map, e.g., for pool1 we have 96 cross-stitch units.”); 
and aggregating the respective successive outputs into an aggregated successive output; and upon aggregating the respective successive outputs of the final layer into an aggregated final output, inputting the aggregated final output into the associated task-specific head of the machine-learned model to generate the task-specific output(Ma, pg. 2, Figure 1 details the following: 

    PNG
    media_image2.png
    477
    437
    media_image2.png
    Greyscale

In the SNR-Aver Network “the shared layers are split into sub-networks and the connection (dashed line) between the sub-networks is a weighted average with scalar latent variables as weights.”And then fed into  Task A and Task B as task specific output.).
 	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Ma with the above teachings of Misra for the same rationale stated at Claim 17.

  


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
WO 2021/097398 A1 (details a multi-head classification architecture in which a conditional/shared parameter is disclosed that supports class-incremental learning). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Adam Clark Standke whose telephone number is (571)270-1806. The examiner can normally be reached 10AM-7PM M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Examiner Note: It is being interpreted that the categorical distribution has two possible outcomes, and hence is equivalent to the Bernoulli distribution.