DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Objections 
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, “a compressed representation”, “compressed channels”, “the selected compressed channels”, and “learning task” must be shown or the features must be canceled from the claims 1-20.  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts.  Three categories of subject matter are found to be judicially recognized exceptions to 35 U.S.C. § 101 (i.e. patent ineligible) (1) laws of nature, (2) physical phenomena, and (3) abstract ideas.  To be patent-eligible, a claim directed to a judicial exception must as whole be directed to significantly more than the exception itself.  Hence, the claim must describe a process or product that applies the exception in a meaningful way, such that it is more than a drafting effort designed to monopolize the exception.
Claims 6 and 10 are rejected under 35 U.S.C. § 101 as not falling within one of the four statutory categories of invention because claims 6 and 10 recite “control signal".  However, a signal is not within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter). 
Claim Rejection – 35 U.S.C. § 112
The following is a quotation of 35 U.S.C. 112(a): 
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. 
The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112: 
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by the inventor of carrying out his invention.
The following is a quotation of 35 U.S.C. 112(b): 
(B) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. A

The following is a quotation of pre-AIA  35 U.S.C. 112, second paragraph: 
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 12, 16, and 20 recite "the selected compressed channels ". There is insufficient antecedent basis for this limitation in the claim. It is noted that these claims previously recite “select a part of the compressed channels”.  However, "the selected compressed channels " is not different from "a selected part of the compressed channels ". Therefore, claims 1, 12, 16, 20, and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 12, 16, and 20 recite "a compressed representation". It is not clear from the claim language what is a format of the compressed representation. It is not clear whether the compressed representation is in a form of a compressed bitstream, such as an encoded bitstream, or a compressed file, such as the JPEG format. Therefore, claims 1, 12, 16, 20, and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 12, 16, and 20 recite "compressed channels". It is not clear from the claim language whether the compressed channels are color channels, mapping channels, classifier channels, program content channels, or something else. Therefore, claims 1, 12, 16, 20, and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 11-13, 15-17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Li et al. (US Patent 11,025,907 B2), (“Li”).

Regarding claim 1, Li meets the claim limitations, as follows:
A machine learning system ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: a compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to use a compression neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to compress an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] into a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51], the compressed representation (i.e. the compressed bitstream) [Li: col. 8, line 51] comprising a sequence of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); a selector having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.)  from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and a learning module having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to perform a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42])  on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].   

Regarding claim 11, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), wherein the selector has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] select  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 12, Li meets the claim limitations, as follows:
A method (i.e. a method) [Li: col. 2, line 23] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising:compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Regarding claim 13, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], further comprising:decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 15, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], wherein selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] the part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach) comprises: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 16, Li meets the claim limitations, as follows:
An apparatus (i.e. an apparatus) [Li: col. 2, line 22] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: at least one memory (i.e. a memory) [Li: col. 6, line 14] for storing instructions (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]; and at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Regarding claim 17, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 19, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].  

Regarding claim 20, Li meets the claim limitations, as follows:
A non-transitory computer readable storage medium (i.e. a memory) [Li: col. 6, line 14] storing a set of instructions that are executable by one or more processing devices to cause a computer to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
           This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Patent 11,025,907 B2), (“Li”), in view of Zhang et al. (US Patent 10,721,471 B2), (“Zhang”).
Regarding claim 1, Li meets the claim limitations, as follows:
A machine learning system ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: a compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to use a compression neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to compress an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] into a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51], the compressed representation (i.e. the compressed bitstream) [Li: col. 8, line 51] comprising a sequence of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); a selector having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.)  from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and a learning module having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to perform a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42])  on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting a part of the compressed channels from the compressed representation (i.e. the discussed features extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; and performing a learning task on the selected compressed channels (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 2, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a first multiplexer (i.e. circuit) [Li: col. 39, line 9] communicatively coupled with (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any
other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48] the compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] and the selector and having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] multiplex the compressed representation from the compressor and the selected compressed channels from the selector  (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any
other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48; Figs. 2, 7, 9, 12-15].
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 1, further comprising: a first multiplexer communicatively coupled with the compressor and the selector and having circuitry configured to multiplex the compressed representation from the compressor and the selected compressed channels from the selector.  
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) (i.e. device 110 may select) [Zhang: col. 4, line 10; Figs. 12-14] 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 3, Li meets the claim limitations as set forth in claim 2.Li further meets the claim limitations as follow.
The machine learning system of claim 2 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a decompressor (i.e. a decoder) [Li: col. 3, line 1; Fig. 5] having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  decompress the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]); and a second multiplexer having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] receive the compressed representation (i.e. other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream) [Li: col. 5, line 44-47] or the selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]), the second multiplexer (i.e. circuit) [Li: col. 39, line 9] being communicatively coupled with the learning module and decompressor (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48] the compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4]  and having circuitry configured (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  to output the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35] or the selected compressed channels to the learning module ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]).
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 2, further comprising: a decompressor having circuitry configured to decompress the compressed representation to generate a decompressed image; and a second multiplexer having circuitry configured to receive the compressed representation or the selected compressed channels, the second multiplexer being communicatively coupled with the learning module and decompressor and having circuitry configured to output the compressed representation to the decompressor or the selected compressed channels to the learning module.   
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a second multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 4, Li meets the claim limitations as set forth in claim 3.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a transmitter (i.e. A transmitting station 102 can be, for example a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices) [Li: col. 5, line 16-21] communicatively coupled with the first multiplexer and configured to transmit the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1] or the selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]); and a receiver (i.e. In one example, the receiving station 106 can be a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices) [Li: col. 5, line 32-38] communicatively coupled with the second multiplexer and configured to receive the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1] or the selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]) from the transmitter and provide the received compressed representation or selected compressed channels to the second multiplexer (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a transmitter communicatively coupled with the first multiplexer and configured to transmit the compressed representation or the selected compressed channels; and a receiver communicatively coupled with the second multiplexer and configured to receive the compressed representation or the selected compressed channels from the transmitter and provide the received compressed representation or selected compressed channels to the second multiplexer.  
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a transmitter (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. bit stream multiplexer) [Zhang: col. 16, line 53; Figs. 12-14] (i.e. receivers) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14] (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14].   
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 5, Li meets the claim limitations as set forth in claim 4.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 
a memory (i.e. a memory) [Li: col. 6, line 14] for storing the compressed representation or the selected compressed channels ((i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing) [Li: col. 6, line 30-33; Figs. 4-5])  from the first multiplexer, wherein the second multiplexer has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] read the stored compressed representation or selected compressed channels from the memory (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22].   
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a memory for storing the compressed representation or the selected compressed channels from the first multiplexer, wherein the second multiplexer has circuitry configured to read the stored compressed representation or selected compressed channels from the memory.   
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]), wherein the second multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 6, Li meets the claim limitations as set forth in claim 3.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 
a controller (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] communicatively coupled with the first multiplexer and the second multiplexer (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured to (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] send a control signal (i.e. communicate the binary data) [Li: col. 13, line 14-15] to the first multiplexer and the second multiplexer (i.e. circuit) [Li: col. 39, line 9], wherein the first multiplexer has circuitry configured to (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] output (i.e. output to the compressed bitstream) [Li: col. 8, line 51], according to the control signal (i.e. carrying out any of the methods, algorithms, or instructions described herein) [Li: col. 39, line 22-24], the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] or the selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]), and the second multiplexer has circuitry configured (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] to output (i.e. output to the compressed bitstream) [Li: col. 8, line 51], according to the control signal (i.e. carrying out any of the methods, algorithms, or instructions described herein) [Li: col. 39, line 22-24], the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35] or the selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]) to the learning module (i.e. a machine-learning model) [Li: col. 12, line 33].    
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a controller communicatively coupled with the first multiplexer and the second multiplexer and having circuitry configured to send a control signal to the first multiplexer and the second multiplexer, wherein the first multiplexer has circuitry configured to output, according to the control signal, the compressed representation or the selected compressed channels, and the second multiplexer has circuitry configured to output, according to the control signal, the compressed representation to the decompressor or the selected compressed channels to the learning module.    
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a controller communicatively coupled with the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and the second multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and having circuitry configured to send a control signal (i.e. In various implementations, platform 1302 may receive
control signals from navigation controller) [Zhang: col. 19, line 26-17] to the first multiplexer and the second multiplexer (i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14], wherein the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) (i.e. signal bearing media providing instructions) [Zhang: col. 16, line 66-67; Figs. 12-14], ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 7, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 44Attorney Docket No.: 12852.0420-00000 Alibaba Ref No.: A29759US a decompressor (i.e. a decoder) [Li: col. 3, line 1; Fig. 5] having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] decompress the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]); a multiplexer having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] receive the compressed representation (i.e. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream) [Li: col. 5, line 45-47], the multiplexer being communicatively coupled with the selector and the decompressor  (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] output the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the selector or the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35], wherein the selector (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] is communicatively coupled (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] with the learning module (i.e. a machine-learning model) [Li: col. 12, line 33].   
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 1, further comprising: 44Attorney Docket No.: 12852.0420-00000 Alibaba Ref No.: A29759US a decompressor having circuitry configured to decompress the compressed representation to generate a decompressed image; a multiplexer having circuitry configured to receive the compressed representation, the multiplexer being communicatively coupled with the selector and the decompressor and having circuitry configured to output the compressed representation to the selector or the decompressor, wherein the selector is communicatively coupled with the learning module.     
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 8, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a transmitter (i.e. A transmitting station 102 can be, for example a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices) [Li: col. 5, line 16-21] communicatively coupled with the compressor and configured to transmit the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1]; and a receiver (i.e. In one example, the receiving station 106 can be a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices) [Li: col. 5, line 32-38] communicatively coupled with the multiplexer and configured to receive the compressed representation from the transmitter and provide the received compressed representation to the multiplexer (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a transmitter communicatively coupled with the compressor and configured to transmit the compressed representation; and a receiver communicatively coupled with the multiplexer and configured to receive the compressed representation from the transmitter and provide the received compressed representation to the multiplexer.   
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a transmitter (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. receivers) [Zhang: col. 20, line 14] (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14].   
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 9, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising:
a memory (i.e. a memory) [Li: col. 6, line 14]  for storing the compressed representation from the compressor ((i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]), wherein the multiplexer (i.e. circuit) [Li: col. 39, line 9] is configured to read the stored compressed representation from the memory  (i.e. the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing) [Li: col. 6, line 30-33; Figs. 4-5].     
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a memory for storing the compressed representation from the compressor, wherein the multiplexer is configured to read the stored compressed representation from the memory.    
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
wherein the multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 10, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising:
a controller (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] communicatively coupled with the multiplexer (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] to send a control signal (i.e. communicate the binary data) [Li: col. 13, line 14-15] to the multiplexer (i.e. circuit) [Li: col. 39, line 9].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a controller communicatively coupled with the multiplexer and having circuitry configured to send a control signal to the multiplexer.      
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and having circuitry configured to send a control signal (i.e. In various implementations, platform 1302 may receive control signals from navigation controller) [Zhang: col. 19, line 26-17] to the multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 11, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), wherein the selector has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] select  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 12, Li meets the claim limitations, as follows:
A method (i.e. a method) [Li: col. 2, line 23] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising:compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting a part of the compressed channels from the compressed representation (i.e. the discussed features extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; and performing a learning task on the selected compressed channels (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 13, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], further comprising:decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 14, Li meets the claim limitations as set forth in claim 13.Li further meets the claim limitations as follow.
The method of claim 13 (i.e. a method) [Li: col. 2, line 23], further comprising: 
determining (i.e. determining a mode decision) [Li: col. 39, line 66-67] whether the learning task (i.e. The machine-learning model may be trained using the vast amount of training data that is available from an encoder performing standard encoding techniques, such as those described below. More specifically, the training data can be used during the learning phase of machine learning to derive (e.g., learn, infer, etc.) the machine-learning model that is (e.g., defines, constitutes, etc.) a mapping from the input data (e.g., block data) to an output) [Li: col. 4, line 43-50] or a reconstruction task (i.e. a reconstruction stage) [Li: col. 8, line 2] is to be performed (i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning) [Li: col. 12, line 32-33; Fig. 7]; in response to the learning task being determined to be performed (i.e. The process 1600 trains, using input data, a machine-learning model to infer one or more mode decisions. The process 1600 then uses the trained machine-learning model to infer a mode decision for an image block) [Li: col. 34, line 40-33; Fig. 16], selecting the part of the compressed channels from the compressed representation ((i.e. At 1602, the process 1600 trains the machine-learning (ML) model. The ML model can be trained using a training data 1612. Each training datum of the training data 1612 can include a video block that was encoded by traditional encoding methods ( e.g., by an encoder such as described with respect to FIGS. 4 and 6-8); a QP used by the encoder; zero or more additional inputs corresponding to inputs used by the encoder in determining the mode decision ( e.g., block partitioning and optionally prediction mode and/or transform unit size) for encoding the video block; and the resulting mode decision determined by the encoder.) [Li: col. 12, line 54-64-33; Figs. 4, 6-8, 16-17]; (i.e. In an example, the mode decision can be a quad-tree partition decision of the image block. The image block can be a block of an image ( e.g., a video frame) that is encoded using intra-prediction. In another example, the mode decision can be a partition that includes partitions described with respect to FIG. 17 described below. As further described below, some of the partitions of FIG. 17 include square and non-square sub-partition; and each of the square sub-partitions can be further partitioned according to one of the partitions of FIG. 17) [Li: col. 34, line 44-53; Figs. 16-17]; and in response to the reconstruction task being determined to be performed (i.e. The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that both the encoder 60 400 and a decoder 500 (described below) use the same reference frames and blocks to decode the compressed bitstream 420) [Li: col. 8, line 58-62], decompressing the compressed representation to generate the decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  
In the same field of endeavor Zhang also discloses the claim limitations as follows:
in response to the learning task being determined to be performed (i.e. Processing may continue at operation 1103, where a machine learning engine is applied to a feature vector including the features generated at operation 1102, a target bitrate for the picture, and a resolution of the picture to generate an estimated quantization parameter for encoding the picture) [Zhang: col. 16, line 4-9], selecting the part of the compressed channels from the compressed representation (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]; and in response to the reconstruction task being determined to be performed (i.e. combining reconstructed residual blocks with reference blocks) [Zhang: col. 4, line 34-35], decompressing the compressed representation to generate the decompressed image (i.e. The compressed signal or data may then be decoded via a decoder that decodes or decompresses the signal or data for display to a user) [Zhang: col. 1, line 16-19].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 15, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], wherein selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] the part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach) comprises: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 16, Li meets the claim limitations, as follows:
An apparatus (i.e. an apparatus) [Li: col. 2, line 22] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: at least one memory (i.e. a memory) [Li: col. 6, line 14] for storing instructions (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]; and at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting a part of the compressed channels from the compressed representation (i.e. the discussed features extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; and performing a learning task on the selected compressed channels (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 17, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 18, Li meets the claim limitations as set forth in claim 17.Li further meets the claim limitations as follow.
The apparatus of claim 17 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:
determining (i.e. determining a mode decision) [Li: col. 39, line 66-67] whether the learning task (i.e. The machine-learning model may be trained using the vast amount of training data that is available from an encoder performing standard encoding techniques, such as those described below. More specifically, the training data can be used during the learning phase of machine learning to derive (e.g., learn, infer, etc.) the machine-learning model that is (e.g., defines, constitutes, etc.) a mapping from the input data (e.g., block data) to an output) [Li: col. 4, line 43-50] or a reconstruction task (i.e. a reconstruction stage) [Li: col. 8, line 2] is to be performed (i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning) [Li: col. 12, line 32-33; Fig. 7]; in response to the learning task being determined to be performed (i.e. The process 1600 trains, using input data, a machine-learning model to infer one or more mode decisions. The process 1600 then uses the trained machine-learning model to infer a mode decision for an image block) [Li: col. 34, line 40-33; Fig. 16], selecting the part of the compressed channels from the compressed representation ((i.e. At 1602, the process 1600 trains the machine-learning (ML) model. The ML model can be trained using a training data 1612. Each training datum of the training data 1612 can include a video block that was encoded by traditional encoding methods ( e.g., by an encoder such as described with respect to FIGS. 4 and 6-8); a QP used by the encoder; zero or more additional inputs corresponding to inputs used by the encoder in determining the mode decision ( e.g., block partitioning and optionally prediction mode and/or transform unit size) for encoding the video block; and the resulting mode decision determined by the encoder.) [Li: col. 12, line 54-64-33; Figs. 4, 6-8, 16-17]; (i.e. In an example, the mode decision can be a quad-tree partition decision of the image block. The image block can be a block of an image ( e.g., a video frame) that is encoded using intra-prediction. In another example, the mode decision can be a partition that includes partitions described with respect to FIG. 17 described below. As further described below, some of the partitions of FIG. 17 include square and non-square sub-partition; and each of the square sub-partitions can be further partitioned according to one of the partitions of FIG. 17) [Li: col. 34, line 44-53; Figs. 16-17]; and in response to the reconstruction task being determined to be performed (i.e. The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that both the encoder 60 400 and a decoder 500 (described below) use the same reference frames and blocks to decode the compressed bitstream 420) [Li: col. 8, line 58-62], decompressing the compressed representation to generate the decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  
In the same field of endeavor Zhang also discloses the claim limitations as follows:
in response to the learning task being determined to be performed (i.e. Processing may continue at operation 1103, where a machine learning engine is applied to a feature vector including the features generated at operation 1102, a target bitrate for the picture, and a resolution of the picture to generate an estimated quantization parameter for encoding the picture) [Zhang: col. 16, line 4-9], selecting the part of the compressed channels from the compressed representation (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]; and in response to the reconstruction task being determined to be performed (i.e. combining reconstructed residual blocks with reference blocks) [Zhang: col. 4, line 34-35], decompressing the compressed representation to generate the decompressed image (i.e. The compressed signal or data may then be decoded via a decoder that decodes or decompresses the signal or data for display to a user) [Zhang: col. 1, line 16-19].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 19, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 20, Li meets the claim limitations, as follows:
A non-transitory computer readable storage medium (i.e. a memory) [Li: col. 6, line 14] storing a set of instructions that are executable by one or more processing devices to cause a computer to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: compressing an image (i.e. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420) [Li: col. 7, line 57-60; Fig. 4] with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. FIG. 9 is a block diagram of an example of a convolutional neural network (CNN) 900 for a mode decision according to implementations of this disclosure) [Li: col. 16, line 57-59]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect
to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of
channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better
performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion process that determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]) on the selected compressed channels ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process. The branch 903-B extracts 256 feature maps (i.e., feature maps 908), each of size 8x8. The branch 903-B first extracts, at a first layer of the branch 903-B, feature maps 906 by convolving the block 902 with 128 filters, each of size 4x4, and using a stride of 4 (i.e., a stride that is equal to the filter size). At a second layer of the branch 903-B, each of the 128 feature maps of the feature maps 906 is convolved with two 2x2 filters, using a stride of 2, thereby resulting in the feature maps 908.) [Li: col. 18, line 44-55]; (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37]).
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting a part of the compressed channels from the compressed representation (i.e. the discussed features extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; and performing a learning task on the selected compressed channels (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]. 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 
Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000./Philip P. Dang/Primary Examiner, Art Unit 2488