Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status of Claims
This action is in response to the RCE and amendments filed on January 22, 2021.
Claims 1-28 are currently pending.
Claims 1-4, 10, 13, 23, and 26 have been amended.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 22, 2021 has been entered.
 

Claim Objections
Claim 23 is objected to because of the following informalities:  line 15 of claim 23 discloses “N1 final-stage machine learning classifiers” but should read “N1 final-stage machine learning classifiers”.  Appropriate correction is required.
The previous objection of claims 5, 12, 21, and 22 are withdrawn.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




Claim(s) 1 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana).

With respect to independent claim 1 Caruana teaches:
A machine learning recognition system (Caruana teaches a method for constructing ensembles of models for classification; see abstract and section 6.) comprising:
a final stage comprising N final-stage machine learning classifiers, wherein N > 1, and wherein each of the N final-stage machine learning classifiers is for classifying a data input to a classification output (Caruana teaches generating a diverse set of models (about 2000 models for each problem); see section 1.  The models taught by Caruana can be used in classification; see abstract and section 6.) and
at least one non-final stage, wherein each of the at least one non-final stages comprises one or more machine learning, data assignment classifiers that receives each data input and outputs each data input that is input to the at least one non-final stage to a selected set of the N final-stage machine learning classifiers (Caruana teaches analyzing various datasets using a selected set of models selected from the ensemble; see section 1.  The claim is silent regarding how the N final-stage classifiers are selected, therefore, Caruana’s selection methods, disclosed in section 2, are sufficient to teach the limitation.), wherein the selected set of the N final-stage machine-learning classifiers comprises one or more of, and less than N of, the N final-stage machine learning classifiers, and such that the selected set of the N final-stage machine learning classifiers classify the data inputs assigned by the at least one non-final stage (Caruana teaches a method for selecting classifier models from an ensemble and the number of selected classifiers is less than the total number of models; see abstract and sections 1 and 2.  The classifiers taught by Caruana are selected from various machine learning classifiers; see the first paragraph in the right column on page 1.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 2-12, 14, 15, 21, 26, and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Zhu et al., U.S. Patent Application Publication 2016/0275374 (Zhu).

With respect to claim 2, the rejection of claim 1 is incorporated.  Further, Caruana does not explicitly teach but Zhu teaches:
each of the one or more machine learning, data assignment classifiers assigns a data input to one and only one of the N final-stage machine learning classifiers (Classification can be performed for an overall image in a first stage. The classification type from the first classification stage can then be used to select a classification model for use in the second stage that is specific to the classification type (e.g., that was trained with content of the classification type) [i.e. first (non-final) stages comprises one or more machine learning, data assignment classifiers that assigns each data input to only one of the N final-stage machine learning classifiers], which can result in more accurate and efficient classification of blocks within the second stage, para 0021, the multi-stage image classifier 210 selects a classification model depending on the classification type from the first image classification stage 220, if the overall classification type indicates primarily text content (as depicted at 230), then the second image classification stage ii: performed for the image (as depicted at 240) using a classification model specific for text content 245. If the overall classification type indicates primarily video content (as depicted at 232), then the second image classification stage is performed for video content (as depicted at 250) using a classification model specific for video content 255. If the overall classification type indicates primarily picture content (as depicted at 234), then the second image classification stage is performed for picture content (as depicted at 260) using a classification model specific for picture content 265, the specific classification types (text, video, and picture} are examples, and that additional or/or other classification types can be used in the second image classification stage, para 0080).
Caruana and Zhu are analogous art directed towards classification systems.  Caruana teaches a system for selecting classifiers from an ensemble and Zhu teaches a multi-stage image classifier.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate Zhu’s teaching of system implementation, data assignment, and classification into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to provide an efficient classification system across a plurality of hardware implementations; see Zhu [0019], [0097], [106], and [0113].

With respect to claim 3, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
each of one or more one machine learning, data assignment classifiers assign each data input to the one or more of, and less than N of, the N final-stage machine learning classifiers that the machine learning, data assignment classifier determines will classify the data input correctly (Multi-stage classification can provide for more efficient classification of images and/or blocks. For example, classification can be performed for an overall image in a first stage. The classification type from the first classification stage can then be used to select a classification model for use in the second stage that is specific to the classification type (e.g., that was trained with content of the classification type), which can result in more accurate and efficient classification of blocks within the second stage, para 0021).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 4, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
the at least one non-final stage comprises a first non-final stage and a second non-final stage (The multi-stage image classifier 118 classifies the input image using an approach with more than two stages, the multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage [i.e. a first non-final stage] results in an overall classification type for the image, the second stage [i.e. a second non-final stage] classifies individual blocks of the image using a limited set or classification types (e.g., only text or non-text), and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video), the second classification stage may be divided into multiple sub-stages (e.g., a first sub-stage that classifies the individual blocks within the limited set of classification types and a second sub-stage that further refines the classification type from the first sub-stage), para 0073.); 
the first non-final stage comprises a first machine learning data classifier (the multi-stage image classifier 118 classifies the input image using an approach with more than two stages, the multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall classification type for the image [i.e. the first non-final stage comprises a first machine learning data classifier], the second stage classifies individual blocks of the image using a limited set of classification types (e.g., only text or non-text), and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video), para 0073.); 
(The multi-stage image classifier 118 classifies the input image using an approach with more than two stages, the multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall classification type for the image, the second stage classifies individual blocks of the image using a limited set of  classification types (e.g., only text or non-text) [i.e. the second non-final stage comprises 2 second stage machine learning classifiers, 2>1], and the third stage further refines the  classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video), para 0073.); 
the first machine learning data classifier of the first non-final stage classifies each data input to one or more of, and less than M of, the M second stage machine learning classifiers of the second non-final stage (The multi-stage image classifier 118 classifies the input image using an approach with more than two stages, the multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall classification type for the image, the second stage classifies individual blocks of the image using a limited set of classification types (e.g., only text or non-text) [i.e. one of 2 second stage machine learning classifiers of the second non-final stage], and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video), para 0073.); and
each of the M second stage machine learning classifiers of the second non-final stage classifies each data input to it to one or more of, and less than N of, the N final-stage machine learning classifiers of the final stage (The multi-stage image classifier 118 classifies the input image using an approach with more than two stages, the multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall  classification type for the image, the second stage classifies individual blocks of the image using a limited set of classification types (e.g., only text or non-text), and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video) [i.e. one of 4 final stage machine learning classifiers of the final stage], para 0073.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 5, the rejection of claim 4 is incorporated.  Further, Zhu teaches:
each of the N final-stage machine learning classifiers has a first machine learning architecture (Using the features, the individual blocks and/or groups of blocks are classified using the selected classification model. The classification model comprises a decision tree table and a support vector machine (SVM) kernel, para 0060.  The support vector machine kernel [i.e. a first machine learning architecture] uses a pattern recognition approach to classify the input images and/or blocks. In some implementations, the support vector machine kernel is used first in the second stage before applying the decision tree table, para 0062.); 
each of the M second stage machine learning classifiers has a second machine learning architecture (Using the features, the individual blocks and/or groups of blocks are classified using the selected classification model. The classification model comprises a decision tree table and a support vector machine (SVM) kernel, para 0060, The decision tree table [i.e. a second machine learning architecture] uses a decision tree approach to classifying the blocks using the features, decision tree learning uses a decision tree to predict the value of a target variable based on a number of inputs, para 0061.); and
the first machine learning architecture of the N final-stage machine learning classifiers is different from the second machine learning architecture of the M second stage machine learning classifiers (Using the features, the individual blocks and/or groups of blocks are classified using the selected classification model. The classification model comprises a decision tree table and a support vector machine (SVM) kernel [i.e. the first machine learning architecture is different from the second machine learning architecture], para 0060, The decision tree table [i.e. a second machine learning architecture] uses a decision tree approach to classifying the blocks using the features, decision tree learning uses a decision tree to predict the value of a target variable based on a number of inputs, para 0061, The support vector machine kernel [i.e. n first machine learning architecture] uses a pattern recognition approach to classify the input images and/or blocks. In some implementations, the support vector machine kernel is used first in the second stage before applying the decision tree table, para 0062).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 6, the rejection of claim 4 is incorporated.  Further, Zhu teaches:
at least two of the M second stage machine learning classifiers having different machine learning architectures (Using the features, the individual blocks and/or groups of blocks are classified using the selected classification model. The classification model comprises a decision tree table and a support vector machine (SVM) kernel i.e. different machine learning architectures], para 0060, The decision tree table uses a decision tree approach to classifying the blocks using the features, decision tree learning uses a decision tree to predict the value of a target variable based on a number of inputs, para 0061, The support vector machine kernel uses a pattern recognition approach to classify the input images and/or blocks. In some implementations, the support vector machine kernel is used first in the second stage before applying the decision tree table, para 0062)
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 7, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
at least two of the N final-stage machine learning classifiers having different machine learning architectures (Using the features, the individual blocks and/or groups of blocks are classified using the selected classification model. The classification model comprises a decision tree table and a support vector machine (SVM) kernel [i.e. different machine learning architectures), para 0060, The decision tree table uses a decision tree approach to classifying the blocks using the features, decision tree learning uses a decision tree to predict the value of a target variable based on a number of inputs, para 0061, The support vector machine kernel uses a pattern recognition approach to classify the input images and/or blocks. In some implementations, the support vector machine kernel is used first in the second stage before applying the decision tree table, para 0062).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 8, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
a learning coach machine learning system that distributes data throughout the final stage and the at least one non-final stage based on observations about internal states of the N final-stage machine learning classifiers and the one or more machine learning, data assignment classifiers (An information distance based on relative entropy can be used as an optimizer or cluster for feature classification. To accomplish the feature classification through relative entropy techniques the following procedure can to be performed (e.g., as a training process) (i.e. a learning coach machine learning system] to determine the grouping (or clustering) of relative entropy results for various classification types, para 0046, a training process can be used to train an image classifier to distinguish between different types of image content (e.g., text content, non-text content, video content, texture content, picture content, skin content, etc.), para 0063, training of the second stage of the multi-stage image classifier comprises support vector machine training and decision tree training, an input image is received by the training process (e.g., one of a number of input images used in the training set). The image is then divided into blocks and features (e.g. a plurality of features) are calculated for the blocks. The blocks are classified manually (e.g., by a person that manually labels the blocks as being one of a number of classification types, such as text, non-text. video, texture, etc.). The support vector machine is trained to classify the blocks based on the calculated features. The output of the support vector machine training is a support vector machine kernel. The decision tree is also trained to distinguish the blocks based on the calculated features. The result of the decision tree training is a decision tree table, para 0064.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 9, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
(A training process can be used to train an image classifier to distinguish between different types of image content (e.g., text content, non-text content, video content, texture content, picture content. skin content, etc.), para 0063, training of the second stage of the multi-stage image classifier comprises support vector machine training and decision tree training, an input image is received by the training process (e.g., one of a number of input images used in the training set). The image is then divided into blocks and features (e.g., a plurality of features) are calculated for the blocks. The blocks are classified manually (e.g., by a person that manually labels the blocks as being one of a number of classification types, such as text, non-text, video, texture, etc.). The support vector machine is trained to classify the blocks based on the calculated features. The output of the support vector machine training is a support vector machine kernel. The decision tree is also trained to distinguish the blocks based on the calculated features. The result of the decision tree training is a decision tree table, para 0064.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 10, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
each of one or more machine learning, data assignment classifiers is trained through supervised training to assign a data input to one or more of the N final-stage machine learning classifiers that the machine learning, data assignment classifier determines is likely to classify the data input correctly (Multi-stage classification can provide for more efficient classification of images and/or blocks. For example, classification can be performed for an overall image in a first stage. The classification type from the first classification stage can then be used to select a classification model for use in the second stage that is specific to the classification type (e.g., that was trained with content of the classification type), which can result in more accurate and efficient classification of blocks within the second stage, para 0021.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 11, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
each of the one or more machine learning, data assignment classifiers is trained to perform the same classifications as the N final-stage machine learning classifiers (If the overall classification of the first stage has high confidence, then the second stage can be skipped and the entire image can be encoded using an encoding technique selected based on the overall classification, which can save computing resources that would otherwise be used for second stage classification, para 0021.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 12, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
the final stage has T different classification categories, such that T < N (The first classification stage can determine an overall classification for an input image (e.g., based on relative entropy results calculated for the input image in relation to various classification types). The second classification stage can be performed by dividing the image into a plurality of blocks and classifying individual blocks, or groups of blocks, based on a classification model that is specific to the overall classification of the image determined in the first classification stage, para 0018, An image can be classified to determine the type of content contained in the image. The type of the image (also called a classification type or image type) [i.e. classification categories] can indicate the primary type of content contained in the image, an image containing screen content of a computer desktop may include text content (e.g., a word processing document}, graphics content (e.g., icons, windows, etc.), and video content (e.g., a video being played in a web browser), para 0023, the multistate image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall classification type for the image, the second stage classifies individual blocks of the image using a limited set of classification types (e.g., only text or non-text). and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video). Alternatively, the second classification stage may be divided into multiple sub-stages (e.g., a first sub-stage that classifies the individual blocks within the limited set of classification types (i.e. T different classification categories] and a second sub-stage that further refines the classification type from the first sub-stage) [i.e. T < N], para 0073.);
each of the N final-stage machine learning classifiers classify a data input to an ordered set of classification categories based on ranking of classification of the data input to the T different classification categories (A threshold is used to classify the input image into one of a number of classification types, the threshold is set to the maximum value of the relative entropy among the training images (the highest value among the distributions of the training images}. In another specific implementation, the threshold is set to the mean average based on integrating the distribution of the training images, para 0048.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 14, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
the non-final stage comprises a plurality P of machine learning, data assignment classifiers(The multi-stage image classifier 118 can employ a three-stage approach in which the first classification stage results in an overall classification type for the image, the second stage classifies individual blocks of the image using a limited set of classification types (e.g., only text or non-text) [i.e. the non-final stage comprises a plurality 2 of machine learning, data assignment classifiers), and the third stage further refines the classification of the individual blocks within the limited set of classification types (e.g., classifies the text blocks from the second stage as black or white text and classifies the non-text blocks from the second stage as picture or video), para 0073.); 
each of the P machine learning, data assignment classifiers is located at a separate physical location (Environment 800, various types of services (e.g., computing services) are provided by a cloud 810, the cloud 810 can comprise a collection of computing devices, which may be located centrally or distributed, that provide cloud-based services to various types of users and devices connected via a network such as the Internet, para 0013.); and
the N final-stage machine learning classifiers are connected to the P machine learning, data assignment classifiers by a data switching network (Environment 800, various types of services (e.g., computing services) are provided by a cloud 810, the cloud 810 can comprise a collection of computing devices, which may be located centrally or distributed, that provide cloud-based services to various types of -users and devices connected via a network such as the Internet, para 0013, The modem 760 is shown generically and can include a cellular modem for communicating with the mobile communication network 704 and/or other radio-based modems (e.g., Bluetooth 764 or Wi-Fi 762). The wireless modem 760 is typically configured for communication with one or more cellular networks, such as a GSM network for data and voice communications within a single cellular network, between cellular networks, or between the mobile device and a public switched telephone network (PSTN), para 0111.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 15, the rejection of claim 14 is incorporated.  Further, Zhu teaches:
the data switching network comprises a packet-switched network (The modem 760 is shown generically and can include a cellular modem for communicating with the mobile communication network 704 and/or other radio-based modems (e.g., Bluetooth 764 or Wi-Fi 762). The wireless modem 760 is typically configured for communication with one or more cellular networks, such as a GSM network for data and voice communications within a single cellular network, between cellular networks, or between the mobile device and a public switched telephone network (PSTN), para 0111.).
See the rejection of claim 2 for the motivation to combine references.

With respect to claim 21, the rejection of claim 1 is incorporated.  Further, Zhu teaches:
the N final-stage machine learning classifiers are image classifiers (Performing multi-stage image classification, para 0006.).
See the rejection of claim 2 for the motivation to combine references.

With respect to independent claim 26, Caruana teaches:
(Caruana teaches a method for constructing ensembles of models for classification; see abstract and section 6.),
… executes one or more programs to implement a final stage comprising N final-stage machine learning classifiers, wherein N > 1, and wherein each of the N final-stage machine learning classifiers is for classifying a data input to a classification output (Caruana teaches generating a diverse set of models (about 2000 models for each problem); see section 1.  The models taught by Caruana can be used in classification; see abstract and section 6.); and 
… executes one or more programs to implement at least one non final stage non-final stage, wherein each of the at least one non-final stages comprises one or more machine learning, data assignment classifiers that receives each data input and outputs each data input that is input to the at least one non-final stage to a selected set of the N final-stage machine learning classifiers (Caruana teaches analyzing various datasets using a selected set of models selected from the ensemble; see section 1.  The claim is silent regarding how the N final-stage classifiers are selected, therefore, Caruana’s selection methods, disclosed in section 2, are sufficient to teach the limitation.), wherein the selected set of the N final-stage machine-learning classifiers comprises one or more of, and less than N of, the N final-stage machine learning classifiers, and such that the selected set of the N final-stage machine learning classifiers classify the data inputs assigned by the at least one non-final stage (Caruana teaches a method for selecting classifier models from an ensemble and the number of selected classifiers is less than the total number of models; see abstract and sections 1 and 2.  The classifiers taught by Caruana are selected from various machine learning classifiers; see the first paragraph in the right column on page 1.).

Caruana does not explicitly teach but Zhu teaches:
the computer system comprising a plurality of servers (A computing device can obtain an input image and classify the input image using the multi-stage image classification techniques, para 0008, The input image 125 can also be received from another image source 122 (e.g., from a video file or picture file stored on the computing device 110, from an image capture device, from an external source such as a remote server, etc.), para 0071.), wherein (The computing system 600 includes one or more processing units 610, 615 and memory 620,625. A processing unit can be a general-purpose central processing unit (CPU), processor in an application-specific integrated circuit (ASIC), or any other type of processor, para 0098.), such that:
Caruana and Zhu are analogous art directed towards classification systems.  Caruana teaches a system for selecting classifiers from an ensemble and Zhu teaches a multi-stage image classifier.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate Zhu’s teaching of system implementation, data assignment, and classification into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to provide an efficient classification system across a plurality of hardware implementations; see Zhu [0019], [0097], [106], and [0113].

With respect to claim 27, the rejection of claim 26 is incorporated.  Further, Zhu teaches:
the processing cores comprise GPU cores (A processing unit can be a general-purpose central processing unit (CPU), processor in an application-specific integrated circuit (ASIC), or any other type of processor, a central processing unit 610 as well as a graphics processing unit (i.e. GPU] or co-processing unit 615, para 0098.).
See the rejection of claim 26 for the motivation to combine references.

Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Zhu et al., U.S. Patent Application Publication 2016/0275374 (Zhu); in view of LeBoeuf et al., U.S. Patent Application Publication 2011/0075851 (LeBoeuf).

With respect to claim 13, the rejection of claim 1 is incorporated.  Caruana and Zhu do not explicitly disclose:
metadata are associated with each data input item; and


However, LeBoeuf teaches these limitations:
metadata are associated with each data input item (Use multi-stage signal analysis, sound-object recognition, and audio stream labeling to analyze audio signals. The resulting labels and metadata allow software and signal processing algorithms to make content-aware decisions, para 0013.); and
the one or more machine learning, data assignment classifiers assigns each data input to the one or more of, and less than N of, the N final-stage machine learning classifiers based in part on the metadata (By using audio signal analysis and machine learning techniques, the type of sound objects presented at the input stage of an audio presentation can be determined in real-time. Sound object types include a male vocalist, female vocalist, snare drum, bass guitar, or guitar feedback. The types of sound objects are not limited to musical instruments, but are inclusive of a classification hierarchy for nearly all natural and artificially created sound-animal sounds, sound effects, medical sounds, auditory environments, and background noises. Sound object recognition may include a single label or a ratio of numerous labels, para 0018, a third stage of processing involves machine-learning, data-mining, or artificial intelligence processing such as but not limited to support vector machines (SVN), neural networks (NN), partitioning/clustering, constraint satisfaction, stream labeling, expert systems, classification according to instrument, genre, artist, etc., time-series classification and/or sound object source separation, para 0022). It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Zhu with the teachings of LeBoeuf for the purpose of using high-level metadata features and symbolic object labels derived from a source and make context aware decisions (LeBoeuf abstract, para 0013.).
Caruana, Zhu, and LeBoeuf are analogous art directed towards data classification.  Caruana teaches a system for selecting classifiers from an ensemble, Zhu teaches a multi-stage 
It would have been obvious for one of ordinary skill in the art of data classification to incorporate LeBoeuf’s teaching of metadata into Zhu’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to automate decisions so users can focus attention elsewhere, as described in para 0013.

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Zhu et al., U.S. Patent Application Publication 2016/0275374 (Zhu); in view of Gould et al., U.S. Patent Application Publication 2003/0024993 (Gould).

With respect to claim 16, the rejection of claim 14 is incorporated and Zhu teaches:
the N final-stage machine learning classifiers are distributed across two or more geographically distributed sites (Environment 800, various types of services (e.g., computing services) are provided by a cloud 810, the cloud 810 can comprise a collection of computing devices, which may be located centrally or distributed, that provide cloud-based services to various types of users and devices connected via a network such as the Internet, para 0013.);

Zhu does not explicitly disclose:
each of the two or more geographically distributed sites comprises one or more inactive final-stage machine learning classifiers;
the N final-stage machine learning classifiers are stored in primary computer memory; and the inactive final-stage machine learning classifiers are stored in secondary computer memory.

However, Gould teaches these limitations:
each of the two or more geographically distributed sites comprises one or more inactive final-stage machine learning classifiers (A memory management method for a memory having a plurality of program contexts including an active program context in an active region and at least one inactive program context in an inactive region, para 0007.);
the N final-stage machine learning classifiers are stored in primary computer memory (A memory management method for a memory having a plurality of program contexts including an active program context in an active region and at least one inactive program context in an inactive region, each program context having a first and a second part wherein said first and second parts of said active program context are separated by a contiguous memory space comprising a free memory block and a common data parameter store, para 0007, several such non-persistent program contexts, stored in RAM 8, including currently inactive program contexts 20, 22 and a currently active program context 24. In this embodiment, the RAM 8 is divided into two regions, an active region 26 and a firewall protected or inactive region 28. The active program context 24 comprising upper block 30 (which is a first part of a program context) and lower block 34 (which is a second part of a program context) resides in the active region 26, para 0040); and 
the inactive final-stage machine learning classifiers are stored in secondary computer memory (A memory management method for a memory having a plurality of program contexts including an active program context in an active region and at least one inactive program context in an inactive region, each program context having a first and a second part wherein said first and second parts of said active program context are separated by a contiguous memory space comprising a free memory block and a common data parameter store, para 0007, several such non-persistent program contexts, stored in RAM 8, including currently inactive program contexts 20, 22 and a currently active program context 24. In this embodiment, the RAM 8 is divided into two regions, an active region 26 and a firewall protected or inactive region 28. The inactive program contexts 20, 22 reside in the inactive region 28, para 0040.).
Caruana, Zhu, and Gould are analogous art directed towards classification systems.  Caruana teaches a system for selecting classifiers from an ensemble, Zhu teaches a multi-stage image classifier, and Gould teaches a memory management system.
.

Claims 17-20 and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Zhu et al., U.S. Patent Application Publication 2016/0275374 (Zhu); in view of Kalinli-Akbacak, U.S. Patent Application Publication 2014/0149112 (Kalinli-Akbacak).

With respect to claim 17, the rejection of claim 1 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the machine learning recognition system comprises a speech recognition system.

However, Kalinli-Akbacak teaches this limitation:
Boundary information can be used to improve speech recognition performance or complementary information from a speech recognition engine can be used to further improve the boundary detection performance, etc., para 0039 of Kalinli-Akbacak.
Caruana, Zhu, and Kalinli-Akbacak are analogous art directed towards classification.  Caruana teaches a system for selecting classifiers from an ensemble, Zhu teaches a multi-stage image classifier, and Kalinli-Akbacak teaches a system for extracting features from audio data.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate Kalinli-Akbacak’s teaching of audio analysis into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to determine phoneme boundaries from a signal corresponding to recorded audio (Kalinli-Akbacak abstract).

With respect to claim 18, the rejection of claim 17 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the one or more machine learning, data assignment classifiers of the at least one non-final stage comprises a phonetic feature classifier.

However, Kalinli-Akbacak teaches this limitation:
Boundary detection methods have been proposed using auditory attention features. To further improve the boundary accuracy, phoneme posteriors can be r:-.nmhined with auditory attention features. Phoneme posteriors are obtained by training a model (for example a deep neural network) which estimates phoneme class posterior score given acoustic features (mfcc, mel filterbank etc.), para 0015, After the auditory gist feature 127' that characterizes the input sound window 101 has been determined, phone boundaries, vowel boundaries, syllable nucleus, or syllable boundaries may be detected from the auditory gist features and phone posteriors. To perform such detection on a given input sound window, a machine learning algorithm 131, such as a neural network, nearest neighbor classifier, decision tree, and the like, can be used to classify boundaries, such as phone boundaries, vowel boundaries, syllable nucleus, or syllable boundaries, para 0037.
See the rejection of claim 17 for the motivation to combine references.

With respect to claim 19, the rejection of claim 18 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the phonetic feature classifier comprises a multi-layer neural network trained as a phonetic-feature-based phoneme recognizer.

However, Kalinli-Akbacak teaches this limitation:
Boundary detection methods have been proposed using auditory attention features. To further improve the boundary accuracy, phoneme posteriors can be combined with auditory attention features. Phoneme posteriors are obtained by training a model (for example a deep neural network) which estimates phoneme class posterior score given acoustic features (mfcc, mel filterbank etc.), para 0015, a neural network can be used as the machine learning algorithm 131 since it is biologically well motivated. In such a case, the neural network 131 can identify the phone boundaries, vowel boundaries, syllable nucleus, or syllable boundaries within the input sound given the cumulative gist vector it is associated with, para 0037, The AA features and phone posteriors may be augmented and sent to a machine learning algorithm 238, e.g., a three-layer neural network (NN) for boundary estimation, para 0048.
See the rejection of claim 17 for the motivation to combine references.

With respect to claim 20, the rejection of claim 17 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the one or more machine learning, data assignment classifiers of the at least one non-final stage comprises a decision tree for recognition of syllables or words.

However, Kalinli-Akbacak teaches this limitation:
To perform such detection on a given input sound window, a machine learning algorithm 131, such as a neural network, nearest neighbor classifier, decision tree, and the like, can be used to classify boundaries, such as phone boundaries, vowel boundaries, syllable nucleus, or syllable boundaries, para 0037.
See the rejection of claim 17 for the motivation of references.

With respect to claim 22, the rejection of claim 1 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the N final-stage machine learning classifiers are speech recognition classifiers.

However, Kalinli-Akbacak teaches this limitation:
Boundary detection methods have been proposed using auditory attention features. To further improve the boundary accuracy, phoneme posteriors can be combined with auditory attention features. Phoneme posteriors are obtained by training a model (for example a deep neural network) which estimates phoneme class posterior score given acoustic features (mfcc, mel filterbank etc.). para 0015, boundary information can be used to improve speech recognition performance or complementary information from a speech recognition engine can be used to further improve the boundary detection performance, etc., para 0039.
See the rejection of claim 17 for the motivation of references.

Claim 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Schalkwyk, U.S. Patent Application Publication 2004/0088163 (Schalkwyk).

With respect to independent claim 23, Caruana teaches:
a final stage comprising N1 final-stage machine learning classifiers, wherein N1 > 1, and wherein each of the N1 final-stage machine learning classifiers is for classifying a data input to a classification output (Caruana teaches generating a diverse set of models (about 2000 models for each problem); see section 1.  The models taught by Caruana can be used in classification; see abstract and section 6.); and
at least one non final stage non-final stage, wherein each of the at least one non-final stages comprises one or more machine learning, data assignment classifiers that receives each data input … and outputs each data input that is input to the at least one non-final stage of the … machine learning classifier to a selected set of the N1 final-stage machine learning classifiers (Caruana teaches analyzing various datasets using a selected set of models selected from the ensemble; see section 1.  The claim is silent regarding how the N final-stage classifiers are selected, therefore, Caruana’s selection methods, disclosed in section 2, are sufficient to teach the limitation.), wherein the selected set of the N1 final-stage machine learning classifiers comprises one or more of, and less than N1 of, the N1 final-stage machine learning classifiers of … machine learning classifier (Caruana teaches a method for selecting classifier models from an ensemble and the number of selected classifiers is less than the total number of models; see abstract and sections 1 and 2.  The classifiers taught by Caruana are selected from various machine learning classifiers; see the first paragraph in the right column on page 1.); and 
a contextual model machine learning classifier, wherein the contextual model machine learning classifier  comprises:
a final stage comprising N2 final-stage machine learning classifiers, wherein N2 > 1, and wherein each of the N2 final-stage machine learning classifiers is for classifying a data input to a classification output (Caruana teaches generating a diverse set of models (about 2000 models for each problem); see section 1.  The models taught by Caruana can be used in classification; see abstract and section 6.); and
at least one non-final stage, wherein each of the at least one non-final stages comprises one or more machine learning, data assignment classifiers that receives each data input to the contextual model machine learning classifier and outputs each data input that is input to the at least one non-final stage of the contextual model machine learning classifier to a selected set of the N2 final-stage machine learning classifiers (Caruana teaches analyzing various datasets using a selected set of models selected from the ensemble; see section 1.  The claim is silent regarding how the N final-stage classifiers are selected, therefore, Caruana’s selection methods, disclosed in section 2, are sufficient to teach the limitation.), wherein the selected set of the N2 final-stage machine learning classifiers comprises one or more of, and less than N2 of, the N2 final-stage machine learning classifiers of the contextual model machine learning classifier, and such that the selected set of the N2 final-stage machine learning classifiers classify the data input assigned by the at least one non-final stage of the contextual model machine learning classifier (Caruana teaches a method for selecting classifier models from an ensemble and the number of selected classifiers is less than the total number of models; see abstract and sections 1 and 2.  The classifiers taught by Caruana are selected from various machine learning classifiers; see the first paragraph in the right column on page 1.).

Caruana does not explicitly teach:
A speech recognition system comprising:


However, Schalkwyk teaches these limitation:
A speech recognition system (Schalkwyk teaches a multi-lingual speech recognition in para 0006.) comprising:
an acoustic model machine learning classifier, wherein the acoustic model machine learning classifier (A speech recognizer 160 uses this context-dependent graph, as well as acoustic models 162 for each of the languages, to convert input speech utterances to word sequence, where different words in the output word sequences may come from different languages, para 0047); and a contextual model machine learning classifier (multi-lingual speech recognition with context modeling, para 0002, in the context of combination of separately trained speech recognition models for different languages that are combined for recognition. An example of such a situation is training of phonetic models for English, and later training a recognizer that is tuned to digit strings using word-dependent subword units. Cross-word context modeling between phonetically-represented words and digits represented by word-dependent features can then use the "language"-independent features to determine appropriate cross-word context models, para 0093.) comprises:
Caruana and Schalkwyk are analogous art directed towards classification.  Caruana teaches a system for selecting classifiers from an ensemble and Schalkwyk teaches a speech recognition system.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate Schalkwyk’s teaching of speech recognition into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to implement a multi-lingual recognition system that permits words from different languages to be recognized; see para 0006.


Claim 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of in view of Schalkwyk, U.S. Patent Application Publication 2004/0088163 (Schalkwyk); in further view of LeBoeuf et al., U.S. Patent Application Publication 2011/0075851 (LeBoeuf).

With respect to claim 24, the rejection of claim 23 is incorporated.  Further, Caruana and Schalkwyk do not explicitly disclose:
the data inputs are spectrograms.

However, LeBoeuf teaches this limitation:
LeBoeuf teaches analyzing spectrograms in para 0032.
Caruana, Schalkwyk, and LeBoeuf are analogous art directed towards classification.  Caruana teaches a system for selecting classifiers from an ensemble, Schalkwyk teaches a speech recognition system, and LeBoeuf teaches a multi-stage analysis system for labeling audio and video data using metadata.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate LeBoeuf’s teaching of spectrogram analysis into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to automate decisions so users can focus attention elsewhere, as described in para 0013.

Claim 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Schalkwyk, U.S. Patent Application Publication 2004/0088163 (Schalkwyk); and in further view of Kalinli-Akbacak, U.S. Patent Application Publication 2014/0149112 (Kalinli-Akbacak).

With respect to claim 25, the rejection of claim 23 is incorporated.  Further Caruana and Schalkwyk do not explicitly disclose:
the data inputs are phonemes.


Kalinli-Akbacak teaches determine phoneme boundaries in the abstract and para 0015.  In order to determine a phoneme boundary the phoneme must be input into the algorithms used by Kalinli-Akbacak.  
Caruana, Schalkwyk, and Kalinli-Akbacak are analogous art directed towards classification.  Caruana teaches a system for selecting classifiers from an ensemble, Schalkwyk teaches a speech recognition system, and Kalinli-Akbacak teaches a system for extracting features from audio data.
It would have been obvious for one of ordinary skill in the art of data classification to incorporate Kalinli-Akbacak’s teaching of phoneme audio analysis into Caruana’s system at the time of filing.  It would have been obvious because one of ordinary skill would be motivated to determine phoneme boundaries from a signal corresponding to recorded audio (Kalinli-Akbacak abstract).

Claim 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Caruana et al., “Ensemble Selection from Libraries of Models” (Caruana); in view of Zhu et al., U.S. Patent Application Publication 2016/0275374 (Zhu); in view of Van Der Made et al., U.S. Patent Application Publication 2017/0024644 (Van Der Made).

With respect to claim 28, the rejection of claim 26 is incorporated.  Further, Caruana and Zhu do not explicitly disclose:
the processing cores comprise processing cores of an AI accelerator.

However, Van Der Made teaches this limitation:
Van Der Made teaches a neural network accelerator system that may be executed on a printed circuit board; see abstract.
Caruana, Zhu, and Van Der Made are analogous art directed towards classification.  Caruana teaches a system for selecting classifiers from an ensemble, Zhu teaches a multi-stage image classifier and Van Der Made teaches a neural network (classifier) accelerator.
.


Response to Arguments
Applicant's arguments filed January 22, 2021 have been fully considered but they are not persuasive.
Beginning on page 8 of remarks Applicant argues that the prior art does not teach the newly amended claim limitations.  Caruana has been incorporated into the rejection of independent claims 1 and 23 above to teach these features.  Applicant also states that amended claims clarify that the non-final stage data assignment classifiers receive a data input and determine which of the N final-stage machine learning classifiers should receive that data input for classification purposes, but such details are absent from the amended claims.  For example, claim 1 recites data assignment classifiers that receive data input and output data input to a selected set of final-stage machine-learning classifiers.  But, claim 1 does not specify how the input is analyzed by the data assignment classifier to determine which final-stage classifier the data should be routed to.  Additionally, the claims do not provide any detail regarding how the set of final-stage classifiers used to classify the input are selected.  The claims merely state that a set of classifiers are selected and that the set of classifiers be a subset of classifiers; the set of classifiers must be less than the total number of classifiers, i.e. a subset.  Details regarding how inputs are analyzed to determine which final stage classifier or set of classifiers should perform the classification would greatly aid in overcoming the art.  But, as currently comprised, the claims are very broad and the combination of art above is sufficient to teach the limitations.

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kang et al., “Partially Connected Feedforward Neural Networks Structured by Input Types” teaches non-fully connected neural networks and data routing.
Ko et al., “From dynamic classifier selection to dynamic ensemble selection” teaches classifier selection based on input data.
Masci et al., “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction” teaches data routing and machine learning.
Pal et al., “Decision Tree Based Classification of Remotely Sensed Data” teaches data routing and classification via a decision tree.
Polikar, “Ensemble Learning” teaches methods of selecting classifiers from an ensemble.
Yan et al., “Sorting-Based Dynamic Classifier Ensemble Selection” teaches methods for selecting classifiers.


Conclusion
Claims 1-28 are rejected.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T PELLETT whose telephone number is (571)270-7156.  The examiner can normally be reached on Monday - Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/DANIEL T PELLETT/Primary Examiner, Art Unit 2121