DETAILED ACTIONNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Applicant Response to Official Action
The response filed on 12/8/2022 has been entered and made of record.
Acknowledgment 
Claims 1-6, 12, 14-16, 18, and 20, amended on 12/8/2022, are acknowledged by the examiner.  
Response to Arguments
Applicant’s arguments with respect to claims 1, 12, 16, 20, and their dependent claims have been considered but they are moot in view of the new grounds of rejection necessitated by amendments initiated by the applicant.  Examiner addresses the main arguments of the Applicant as below.
Regarding the drawing objection, the amendment filed on 12/8/2022, addresses the issue.  As a result, the drawing objection is withdrawn. 
Regarding the 35 U.S.C. 101 rejection, in the Remarks filed on 12/8/2022, the Applicant indicated that the invention does not claim the control signal [paragraph 1 in page 10 of the Remarks].  As a result, the 35 U.S.C. 101 rejection is withdrawn.
Regarding the 35 U.S.C. 112(b) rejections, the amendment filed on 12/8/2022 does not address all issues.  As a result, some of the 35 U.S.C. 112(b) rejections are maintained.
Regarding the U.S.C. 102 rejection, the Applicant amended the claim then argued that, “In particular, Li fails to teach to the claim element "a compressor having circuitry configured to use a compression neural network to compress an image into a compressed representation, the compressed representation comprising a plurality of compressed channels," Applicant submits that Li fails to teach using a "neural network to compress an image." Li teaches "a convolutional neural network (CNN) for determining a mode decision for encoding a block in video coding .... the mode decision comprises a block partitioning of the block." Li, column 1, lines 32-34 and 39-40.” [Paragraphs 3-4 on page 12 of the Remarks].
Examiner respectfully disagrees. The Applicant argued that the CNN of Li is only used to determine a mode decision for encoding a block in video coding, but the CNN of Li does not actually compress an image into a compressed representation. As a result, the Applicant only cited a portion of Li to support Applicant‘s argument.  The Applicant, however, ignored the fact that Li defines the CNN architecture in his invention, which includes a transmitting station 102, a network 104, and a receiving station 106. Li further describes that in his CNN architecture the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106 (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]. Li further describes the implementation of his CNN architecture using multiple computing devices and on a network as follow ((i.e. FIG. 2 is a block diagram of an example of a computing device 200 that can implement a transmitting station or a receiving station. For example, the computing device 200 can implement one or both of the transmitting station 102 and the receiving station 106 of FIG. 1. The computing device 200 can be in the form of a computing system including multiple computing devices, or in the form of a single computing device, for example, a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, and the like. A CPU 202 in the computing device 200 can be a central processing unit. Alternatively, the CPU 202 can be any other type of device, or multiple devices, now-existing or hereafter developed, capable of manipulating or processing information. Although the disclosed implementations can be practiced with a single processor as shown ( e.g., the CPU 202), advantages in speed and efficiency can be achieved by using more than one processor) [Li: col. 5, line 63 – col. 6, line 14; Fig. 2]; (i.e. Although FIG. 2 depicts the CPU 202 and the memory 204 of the computing device 200 as being integrated into a single unit, other configurations can be utilized. The operations of the CPU 202 can be distributed across multiple machines ( each machine having one or more processors) that can be coupled directly or across a local area or other network. The memory 204 can be distributed across multiple machines, such as a network-based memory or memory in multiple machines performing the operations of the computing device 200. Although depicted here as a single bus, the bus 212 of the computing device 200 can be composed of multiple buses. Further, the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. The computing device 200 can thus be implemented in a wide variety of configurations) [Li: col. 7, line 1–18; Fig. 2]). Li further describes functions of the encoder which is implemented on the transmitting station 102 of his CNN architecture as follow (i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]). It is clear from Li’s description that his CNN compresses image and video data.  As a result, the Applicant’s argument “Li fails to teach using a "neural network to compress an image"” is not persuasive. 
In addition, the Applicant also argued that “Li also fails to teach the claim element "a selector having circuitry configured to select one or more of the plurality of compressed channels from the compressed representation" as recited in claim 1. As described above, Li uses a CNN to generate inputs into the compression process and does not perform the compression process. Because Li provides inputs to the compression process, it does not select one or more of the plurality of compressed channels from the compressed representation" of the image as recited in claim 1”  [Paragraphs 4 on page 13 of the Remarks].
Examiner respectfully disagrees.  As it was discussed above, Li’s CNN architecture performs the compression process and generates compressed bitstreams.  Moreover, Li also describes that  ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]).  As a result,  the Applicant’s argument “Li uses a CNN to generate inputs into the compression process and does not perform the compression process. Because Li provides inputs to the compression process, it does not select one or more of the plurality of compressed channels from the compressed representation"” is not persuasive. 
Moreover, the Applicant also argued that “Li also fails to teach the claim element "a learning module having circuitry configured to perform a learning task on the one or more selected compressed channels." As described above, Li uses a CNN to generate inputs into the compression process and does not perform the compression process. Because Li provides inputs to the compression process, it does not use compressed channels "to perform a learning task on the one or more selected compressed channels" as recited in claim 1.”  [Paragraphs 1 on page 14 of the Remarks].
Examiner respectfully disagrees.  As it was discussed above, Li’s CNN architecture performs the compression process and generates compressed bitstreams.  Moreover, Li also describes that  ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected). As a result,  the Applicant’s argument “Li also fails to teach the claim element "a learning module having circuitry configured to perform a learning task on the one or more selected compressed channels." As described above, Li uses a CNN to generate inputs into the compression process and does not perform the compression process. Because Li provides inputs to the compression process, it does not use compressed channels "to perform a learning task on the one or more selected compressed channels" as recited in claim 1” is not persuasive.
Since all three supported arguments are not persuasive, accordingly the Examiner respectfully maintains the rejections and applicability of the arts used.
Regarding the U.S.C. 103 rejection, the Applicant amended the claim then argued that, “The combination of Li and Zhang fails to teach or suggest the elements of claim 1. As noted above, Li fails to teach these elements, nor does Li suggest them.” [Paragraph 5 on page 14 of the Remarks].           Examiner respectfully disagrees with the Applicant’s argument.
As it was discussed in the response for the U.S.C. 102 argument,  Li discloses all elements of claim 1. As a result, the Applicant’s argument “Li also fails to teach the claim element "The combination of Li and Zhang fails to teach or suggest the elements of claim 1. As noted above, Li fails to teach these elements, nor does Li suggest them” is not persuasive.
In addition, with Li’s description, Zhang further disclose limitations of claim 1 as follow:
select one or more of the compressed channels from the plurality of compressed representation ((i.e. extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; (i.e. The training
pictures may be selected using any suitable technique or 55 techniques. For example, the training pictures may be selected to include a wide range of video picture scenarios) [Zhang: col. 10, line 53-56]); and performing a learning task on the one or more selected compressed channels ((i.e. The techniques discussed herein use such deep learning neural network training to automatically analyze the input pictures using the selected features and via the deep learning neural network to predict) [Zhang: col. 12, line 19-22]; (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]). 
Accordingly, the Examiner respectfully maintains the rejections and applicability of the arts used.
  
Claim Rejection – 35 U.S.C. § 112

The following is a quotation of 35 U.S.C. 112(b): 
(B) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. A

The following is a quotation of pre-AIA  35 U.S.C. 112, second paragraph: 
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 12, 16, and 20 recite "use a compression neural network to compress an image into a compressed representation".  However, the claims do not indicate any compression algorithm. Hence it is not clear from the claim language what compression method is used in the claims. Therefore, claims 1, 12, 16, 20, and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph. In this Office action, any method that reduces an image size can be considered.  In addition, the compressed representation could be in any compressed format.   
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. Claims 1, 12, 16, and 20 recite "the selected compressed channels". However, the claims do not specify any condition that a channel would be selected. Hence it is not clear from the claim language how the selector would select one or more channels. Therefore, claims 1, 12, 16, 20, and their dependent claims are indefinite and are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph. In this Office action, it is assumed that the one or more compressed channels are randomly selected.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 11-13, 15-17, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Li et al. (US Patent 11,025,907 B2), (“Li”).

Regarding claim 1, Li meets the claim limitations, as follows:
A machine learning system ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: a compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to use a compression neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to compress an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) into a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51], the compressed representation (i.e. the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); a selector having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.)  from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and a learning module having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to perform a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected))  on the one or more selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].   

Regarding claim 11, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), wherein the selector has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] select  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 12, Li meets the claim limitations, as follows:
A method (i.e. a method) [Li: col. 2, line 23] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising:compressing an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on the one or more selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Regarding claim 13, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], further comprising:decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 15, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], wherein selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] the one or more compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach) comprises: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 16, Li meets the claim limitations, as follows:
An apparatus (i.e. an apparatus) [Li: col. 2, line 22] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: at least one memory (i.e. a memory) [Li: col. 6, line 14] for storing instructions (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]; and at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4])  with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on one or more the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Regarding claim 17, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 19, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].  

Regarding claim 20, Li meets the claim limitations, as follows:
A non-transitory computer readable storage medium (i.e. a memory) [Li: col. 6, line 14] storing a set of instructions that are executable by one or more processing devices to cause a computer to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a part of the compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on the selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA  35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
           This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Patent 11,025,907 B2), (“Li”), in view of Zhang et al. (US Patent 10,721,471 B2), (“Zhang”).
Regarding claim 1, Li meets the claim limitations, as follows:
A machine learning system ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: a compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to use a compression neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to compress an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) into a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51], the compressed representation (i.e. the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); a selector having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.)  from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and a learning module having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to perform a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected))  on the one or more selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
select one or more of the compressed channels from the plurality of compressed representation ((i.e. extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; (i.e. The training
pictures may be selected using any suitable technique or 55 techniques. For example, the training pictures may be selected to include a wide range of video picture scenarios) [Zhang: col. 10, line 53-56]); and performing a learning task on the one or more selected compressed channels ((i.e. The techniques discussed herein use such deep learning neural network training to automatically analyze the input pictures using the selected features and via the deep learning neural network to predict) [Zhang: col. 12, line 19-22]; (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 2, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a first multiplexer (i.e. circuit) [Li: col. 39, line 9] communicatively coupled with (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any
other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48] the compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] and the selector and having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to select (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] multiplex the compressed representation from the compressor and the one or more selected compressed channels from the selector  (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any
other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48; Figs. 2, 7, 9, 12-15].
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 1, further comprising: a first multiplexer communicatively coupled with the compressor and the selector and having circuitry configured to multiplex the compressed representation from the compressor and the one or more selected compressed channels from the selector.  
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) (i.e. device 110 may select) [Zhang: col. 4, line 10; Figs. 12-14] 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 3, Li meets the claim limitations as set forth in claim 2.Li further meets the claim limitations as follow.
The machine learning system of claim 2 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a decompressor (i.e. a decoder) [Li: col. 3, line 1; Fig. 5] having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  decompress the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]); and a second multiplexer having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] receive the compressed representation (i.e. other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream) [Li: col. 5, line 44-47] or the one or more selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]), the second multiplexer (i.e. circuit) [Li: col. 39, line 9] being communicatively coupled with the learning module and decompressor (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48] the compressor (i.e. an encoder) [Li: col. 7, line 47; Fig. 4]  and having circuitry configured (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  to output the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35] or the one or more selected compressed channels to the learning module ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]).
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 2, further comprising: a decompressor having circuitry configured to decompress the compressed representation to generate a decompressed image; and a second multiplexer having circuitry configured to receive the compressed representation or the one or more selected compressed channels, the second multiplexer being communicatively coupled with the learning module and decompressor and having circuitry configured to output the compressed representation to the decompressor or the one or more selected compressed channels to the learning module.   
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a second multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 4, Li meets the claim limitations as set forth in claim 3.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a transmitter (i.e. A transmitting station 102 can be, for example a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices) [Li: col. 5, line 16-21] communicatively coupled with the first multiplexer and configured to transmit the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1] or the one or more selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]); and a receiver (i.e. In one example, the receiving station 106 can be a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices) [Li: col. 5, line 32-38] communicatively coupled with the second multiplexer and configured to receive the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1] or the one or more selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]) from the transmitter and provide the received compressed representation or the one or more selected compressed channels to the second multiplexer (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a transmitter communicatively coupled with the first multiplexer and configured to transmit the compressed representation or the one or more selected compressed channels; and a receiver communicatively coupled with the second multiplexer and configured to receive the compressed representation or the one or more selected compressed channels from the transmitter and provide the received compressed representation or the one or more selected compressed channels to the second multiplexer.  
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a transmitter (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. bit stream multiplexer) [Zhang: col. 16, line 53; Figs. 12-14] (i.e. receivers) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14] (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14].   
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 5, Li meets the claim limitations as set forth in claim 4.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 
a memory (i.e. a memory) [Li: col. 6, line 14] for storing the compressed representation or the one or more selected compressed channels ((i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing) [Li: col. 6, line 30-33; Figs. 4-5])  from the first multiplexer, wherein the second multiplexer has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] read the stored compressed representation or the one or more selected compressed channels from the memory (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22].   
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a memory for storing the compressed representation or the one or more selected compressed channels from the first multiplexer, wherein the second multiplexer has circuitry configured to read the stored compressed representation or the one or more selected compressed channels from the memory.   
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]), wherein the second multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 6, Li meets the claim limitations as set forth in claim 3.Li further meets the claim limitations as follow.
The machine learning system of claim 3 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 
a controller (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] communicatively coupled with the first multiplexer and the second multiplexer (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured to (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] send a control signal (i.e. communicate the binary data) [Li: col. 13, line 14-15] to the first multiplexer and the second multiplexer (i.e. circuit) [Li: col. 39, line 9], wherein the first multiplexer has circuitry configured to (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] output (i.e. output to the compressed bitstream) [Li: col. 8, line 51], according to the control signal (i.e. carrying out any of the methods, algorithms, or instructions described herein) [Li: col. 39, line 22-24], the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] or the one or more selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]), and the second multiplexer has circuitry configured (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] to output (i.e. output to the compressed bitstream) [Li: col. 8, line 51], according to the control signal (i.e. carrying out any of the methods, algorithms, or instructions described herein) [Li: col. 39, line 22-24], the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35] or the one or more selected compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65; Figs. 5, 12-16]) to the learning module (i.e. a machine-learning model) [Li: col. 12, line 33].    
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 3, further comprising: a controller communicatively coupled with the first multiplexer and the second multiplexer and having circuitry configured to send a control signal to the first multiplexer and the second multiplexer, wherein the first multiplexer has circuitry configured to output, according to the control signal, the compressed representation or the one or more selected compressed channels, and the second multiplexer has circuitry configured to output, according to the control signal, the compressed representation to the decompressor or the one or more selected compressed channels to the learning module.    
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
a controller communicatively coupled with the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and the second multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and having circuitry configured to send a control signal (i.e. In various implementations, platform 1302 may receive
control signals from navigation controller) [Zhang: col. 19, line 26-17] to the first multiplexer and the second multiplexer (i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14], wherein the first multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) (i.e. signal bearing media providing instructions) [Zhang: col. 16, line 66-67; Figs. 12-14], ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 7, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: 44Attorney Docket No.: 12852.0420-00000 Alibaba Ref No.: A29759US a decompressor (i.e. a decoder) [Li: col. 3, line 1; Fig. 5] having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] decompress the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]); a multiplexer having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] receive the compressed representation (i.e. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream) [Li: col. 5, line 45-47], the multiplexer being communicatively coupled with the selector and the decompressor  (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] output the compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] to the selector or the decompressor (i.e. The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter-prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filtering stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420) [Li: col. 9, line 26-35], wherein the selector (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] is communicatively coupled (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] with the learning module (i.e. a machine-learning model) [Li: col. 12, line 33].   
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 1, further comprising: 44Attorney Docket No.: 12852.0420-00000 Alibaba Ref No.: A29759US a decompressor having circuitry configured to decompress the compressed representation to generate a decompressed image; a multiplexer having circuitry configured to receive the compressed representation, the multiplexer being communicatively coupled with the selector and the decompressor and having circuitry configured to output the compressed representation to the selector or the decompressor, wherein the selector is communicatively coupled with the learning module.     
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 8, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising: a transmitter (i.e. A transmitting station 102 can be, for example a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices) [Li: col. 5, line 16-21] communicatively coupled with the compressor and configured to transmit the compressed representation (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1]; and a receiver (i.e. In one example, the receiving station 106 can be a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices) [Li: col. 5, line 32-38] communicatively coupled with the multiplexer and configured to receive the compressed representation from the transmitter and provide the received compressed representation to the multiplexer (i.e. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 22-32; Fig. 1].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a transmitter communicatively coupled with the compressor and configured to transmit the compressed representation; and a receiver communicatively coupled with the multiplexer and configured to receive the compressed representation from the transmitter and provide the received compressed representation to the multiplexer.   
However, in the same field of endeavor Zhang further discloses the claim limitations and the deficient claim limitations, as follows:
a transmitter (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. receivers) [Zhang: col. 20, line 14] (i.e. transmitters) [Zhang: col. 20, line 14] (i.e. de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14].   
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 9, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising:
a memory (i.e. a memory) [Li: col. 6, line 14]  for storing the compressed representation from the compressor ((i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]; (i.e. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding) [Li: col. 5, line 41-48]), wherein the multiplexer (i.e. circuit) [Li: col. 39, line 9] is configured to read the stored compressed representation from the memory  (i.e. the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing) [Li: col. 6, line 30-33; Figs. 4-5].     
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a memory for storing the compressed representation from the compressor, wherein the multiplexer is configured to read the stored compressed representation from the memory.    
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
wherein the multiplexer ((i.e. Those skilled in the art may recognize that systems described herein may include additional components that have not been depicted in the corresponding figures. For example, the systems discussed herein may include additional components such as bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 48-53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 10, Li meets the claim limitations as set forth in claim 7.Li further meets the claim limitations as follow.
The machine learning system of claim 7 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), further comprising:
a controller (i.e. an encoder) [Li: col. 7, line 47; Fig. 4] having circuitry (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] communicatively coupled with the multiplexer (i.e. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device) [Li: col. 39, line 49-54] and having circuitry configured (i.e. programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] to send a control signal (i.e. communicate the binary data) [Li: col. 13, line 14-15] to the multiplexer (i.e. circuit) [Li: col. 39, line 9].  
Li does not explicitly disclose the following claim limitations (Emphasis added).
The machine learning system of claim 7, further comprising: a controller communicatively coupled with the multiplexer and having circuitry configured to send a control signal to the multiplexer.      
However, in the same field of endeavor Zhang further discloses the deficient claim limitations and the claim limitations as follows:
((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]) and having circuitry configured to send a control signal (i.e. In various implementations, platform 1302 may receive control signals from navigation controller) [Zhang: col. 19, line 26-17] to the multiplexer ((i.e. bit stream multiplexer or de-multiplexer) [Zhang: col. 16, line 53; Figs. 12-14]; (i.e. switches) [Zhang: col. 21, line 23; Figs. 12-14]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 11, Li meets the claim limitations as set forth in claim 1.Li further meets the claim limitations as follow.
The machine learning system of claim 1 ((i.e. a computing system) [Li: col. 6, line 1] ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), wherein the selector has circuitry configured to (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] select  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 12, Li meets the claim limitations, as follows:
A method (i.e. a method) [Li: col. 2, line 23] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising:compressing an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on the one or more selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting one or more of the compressed channels from the plurality of compressed representation ((i.e. extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; (i.e. The training
pictures may be selected using any suitable technique or 55 techniques. For example, the training pictures may be selected to include a wide range of video picture scenarios) [Zhang: col. 10, line 53-56]); and performing a learning task on the one or more selected compressed channels ((i.e. The techniques discussed herein use such deep learning neural network training to automatically analyze the input pictures using the selected features and via the deep learning neural network to predict) [Zhang: col. 12, line 19-22]; (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 13, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], further comprising:decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 14, Li meets the claim limitations as set forth in claim 13.Li further meets the claim limitations as follow.
The method of claim 13 (i.e. a method) [Li: col. 2, line 23], further comprising: 
determining (i.e. determining a mode decision) [Li: col. 39, line 66-67] whether the learning task (i.e. The machine-learning model may be trained using the vast amount of training data that is available from an encoder performing standard encoding techniques, such as those described below. More specifically, the training data can be used during the learning phase of machine learning to derive (e.g., learn, infer, etc.) the machine-learning model that is (e.g., defines, constitutes, etc.) a mapping from the input data (e.g., block data) to an output) [Li: col. 4, line 43-50] or a reconstruction task (i.e. a reconstruction stage) [Li: col. 8, line 2] is to be performed (i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning) [Li: col. 12, line 32-33; Fig. 7]; in response to the learning task being determined to be performed (i.e. The process 1600 trains, using input data, a machine-learning model to infer one or more mode decisions. The process 1600 then uses the trained machine-learning model to infer a mode decision for an image block) [Li: col. 34, line 40-33; Fig. 16], selecting the one or more compressed channels from the compressed representation ((i.e. At 1602, the process 1600 trains the machine-learning (ML) model. The ML model can be trained using a training data 1612. Each training datum of the training data 1612 can include a video block that was encoded by traditional encoding methods ( e.g., by an encoder such as described with respect to FIGS. 4 and 6-8); a QP used by the encoder; zero or more additional inputs corresponding to inputs used by the encoder in determining the mode decision ( e.g., block partitioning and optionally prediction mode and/or transform unit size) for encoding the video block; and the resulting mode decision determined by the encoder.) [Li: col. 12, line 54-64-33; Figs. 4, 6-8, 16-17]; (i.e. In an example, the mode decision can be a quad-tree partition decision of the image block. The image block can be a block of an image ( e.g., a video frame) that is encoded using intra-prediction. In another example, the mode decision can be a partition that includes partitions described with respect to FIG. 17 described below. As further described below, some of the partitions of FIG. 17 include square and non-square sub-partition; and each of the square sub-partitions can be further partitioned according to one of the partitions of FIG. 17) [Li: col. 34, line 44-53; Figs. 16-17]; and in response to the reconstruction task being determined to be performed (i.e. The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that both the encoder 60 400 and a decoder 500 (described below) use the same reference frames and blocks to decode the compressed bitstream 420) [Li: col. 8, line 58-62], decompressing the compressed representation to generate the decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  
In the same field of endeavor Zhang also discloses the claim limitations as follows:
in response to the learning task being determined to be performed (i.e. Processing may continue at operation 1103, where a machine learning engine is applied to a feature vector including the features generated at operation 1102, a target bitrate for the picture, and a resolution of the picture to generate an estimated quantization parameter for encoding the picture) [Zhang: col. 16, line 4-9], selecting the one or more compressed channels from the compressed representation (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]; and in response to the reconstruction task being determined to be performed (i.e. combining reconstructed residual blocks with reference blocks) [Zhang: col. 4, line 34-35], decompressing the compressed representation to generate the decompressed image (i.e. The compressed signal or data may then be decoded via a decoder that decodes or decompresses the signal or data for display to a user) [Zhang: col. 1, line 16-19].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 15, Li meets the claim limitations as set forth in claim 12.Li further meets the claim limitations as follow.
The method of claim 12 (i.e. a method) [Li: col. 2, line 23], wherein selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] the one or more compressed channels ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach) comprises: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]) with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 16, Li meets the claim limitations, as follows:
An apparatus (i.e. an apparatus) [Li: col. 2, line 22] for machine learning ((i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning model) [Li: col. 12, line 32-33];  (i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]), comprising: at least one memory (i.e. a memory) [Li: col. 6, line 14] for storing instructions (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]; and at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9] configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:compressing an image (i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on the one or more selected compressed channels (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37].
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting one or more of the plurality of compressed channels from the compressed representation ((i.e. extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; (i.e. The training
pictures may be selected using any suitable technique or 55 techniques. For example, the training pictures may be selected to include a wide range of video picture scenarios) [Zhang: col. 10, line 53-56]); and performing a learning task on the one or more selected compressed channels ((i.e. The techniques discussed herein use such deep learning neural network training to automatically analyze the input pictures using the selected features and via the deep learning neural network to predict) [Zhang: col. 12, line 19-22]; (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54].

Regarding claim 17, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: decompressing (i.e. decompression) [Li: col. 38, line 42] the compressed representation to generate a decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  

Regarding claim 18, Li meets the claim limitations as set forth in claim 17.Li further meets the claim limitations as follow.
The apparatus of claim 17 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]:
determining (i.e. determining a mode decision) [Li: col. 39, line 66-67] whether the learning task (i.e. The machine-learning model may be trained using the vast amount of training data that is available from an encoder performing standard encoding techniques, such as those described below. More specifically, the training data can be used during the learning phase of machine learning to derive (e.g., learn, infer, etc.) the machine-learning model that is (e.g., defines, constitutes, etc.) a mapping from the input data (e.g., block data) to an output) [Li: col. 4, line 43-50] or a reconstruction task (i.e. a reconstruction stage) [Li: col. 8, line 2] is to be performed (i.e. an encoder (e.g., the encoder 400 of FIG. 4) or a machine-learning) [Li: col. 12, line 32-33; Fig. 7]; in response to the learning task being determined to be performed (i.e. The process 1600 trains, using input data, a machine-learning model to infer one or more mode decisions. The process 1600 then uses the trained machine-learning model to infer a mode decision for an image block) [Li: col. 34, line 40-33; Fig. 16], selecting the one or more compressed channels from the compressed representation ((i.e. At 1602, the process 1600 trains the machine-learning (ML) model. The ML model can be trained using a training data 1612. Each training datum of the training data 1612 can include a video block that was encoded by traditional encoding methods ( e.g., by an encoder such as described with respect to FIGS. 4 and 6-8); a QP used by the encoder; zero or more additional inputs corresponding to inputs used by the encoder in determining the mode decision ( e.g., block partitioning and optionally prediction mode and/or transform unit size) for encoding the video block; and the resulting mode decision determined by the encoder.) [Li: col. 12, line 54-64-33; Figs. 4, 6-8, 16-17]; (i.e. In an example, the mode decision can be a quad-tree partition decision of the image block. The image block can be a block of an image ( e.g., a video frame) that is encoded using intra-prediction. In another example, the mode decision can be a partition that includes partitions described with respect to FIG. 17 described below. As further described below, some of the partitions of FIG. 17 include square and non-square sub-partition; and each of the square sub-partitions can be further partitioned according to one of the partitions of FIG. 17) [Li: col. 34, line 44-53; Figs. 16-17]; and in response to the reconstruction task being determined to be performed (i.e. The reconstruction path in FIG. 4 (shown by the dotted connection lines) can be used to ensure that both the encoder 60 400 and a decoder 500 (described below) use the same reference frames and blocks to decode the compressed bitstream 420) [Li: col. 8, line 58-62], decompressing the compressed representation to generate the decompressed image ((i.e. decoding of the video stream) [Li: col. 5, line 23-24]; (i.e. an apparatus for decoding an image) [Li: col. 2, line 21-22]; (i.e. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data) [Li: col. 9, line 19-22]).  
In the same field of endeavor Zhang also discloses the claim limitations as follows:
in response to the learning task being determined to be performed (i.e. Processing may continue at operation 1103, where a machine learning engine is applied to a feature vector including the features generated at operation 1102, a target bitrate for the picture, and a resolution of the picture to generate an estimated quantization parameter for encoding the picture) [Zhang: col. 16, line 4-9], selecting the one or more compressed channels from the compressed representation (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]; and in response to the reconstruction task being determined to be performed (i.e. combining reconstructed residual blocks with reference blocks) [Zhang: col. 4, line 34-35], decompressing the compressed representation to generate the decompressed image (i.e. The compressed signal or data may then be decoded via a decoder that decodes or decompresses the signal or data for display to a user) [Zhang: col. 1, line 16-19].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Regarding claim 19, Li meets the claim limitations as set forth in claim 16.Li further meets the claim limitations as follow.
The apparatus of claim 16 (i.e. an apparatus) [Li: col. 2, line 22], wherein the at least one processor (i.e. microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit) [Li: col. 39, line 7-9]  is configured to execute the instructions to cause the apparatus to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: 
selecting  (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] a plurality of top compressed channels with largest entropies or feature value variances from the compressed representation (i.e. At a high level, and without loss of generality, a machine learning model, such as a classification deep-learning model, includes two main portions: a feature-extraction portion and a classification portion. The feature-extraction portion detects features of the model. The classification portion attempts to classify the detected features into a desired response. Each of the portions can include one or more layers and/or one or more operations. As mentioned above, a CNN is an example of a machine learning model. In a CNN, the feature extraction portion can include a set of convolutional operations, which is typically a series of filters that are used to filter an input image based on a filter ( e.g., a square of size k). For example, and in the context of machine vision, these filters can be used to find features in an input image. The features can include, for example, edges, corners, endpoints, and so on. As the number of stacked convolutional operations increases, later convolutional operations can find higher-level features. In a CNN, the classification portion may be a set of fully connected layers. The fully connected layers can be thought of as looking at all the input features of an image in order to generate a high-level classifier. Several stages (e.g., a series) of high-level classifiers eventually generate the desired classification output.) [Li: col. 14, line 38-61].

Regarding claim 20, Li meets the claim limitations, as follows:
A non-transitory computer readable storage medium (i.e. a memory) [Li: col. 6, line 14] storing a set of instructions that are executable by one or more processing devices to cause a computer to perform (i.e. a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU) [Li: col. 7, line 50-54; Fig. 2]: compressing an image ((i.e. FIG. 4 is a block diagram of an encoder 400 in accordance with implementations of this disclosure. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the transmitting station 102 to encode video data in manners described herein. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter-prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300) [Li: col. 7, line 47 – col. 8, line 4; Fig. 4]; (i.e. Other variations of the encoder 400 can be used to encode the compressed bitstream 420) [Li: col. 9, line 8-9; Fig. 4];  (i.e. The encoder can encode the binary data in a compressed bitstream, such as the compressed bitstream 420 of FIG. 4) [Li: col. 13, line 14-16; Fig. 4]; (i.e. Feature map 1014 is the output of convolving the region 1002 and the filter 1004) [Li: col. 18, line 20-21; Fig. 10]; (i.e. consistent with the description of FIG. 4, ultimately entropy encode, as 40 described with respect to the entropy encoding stage 408, the image block in a compressed bitstream, such as the bitstream 420 of FIG. 4) [Li: col. 36, line 38-42; Fig. 4]) with a neural network ((i.e. encoding a block in video coding using a convolutional neural network) [Li: col. 41, line 30-32]; (i.e. Further details of the inventive CNN architectures according to the teachings herein will be discussed below first with reference to a block-based codec with the teachings may be incorporated. Although a block-based codec is described as an example, other codecs may be used with the present teachings, including a feature-based codec. FIG. 1 is a schematic of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware, such as that described with respect to FIG. 2. However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices. A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106) [Li: col. 5, line 9-67; Fig. 1-2]) to generate a compressed representation (i.e. output to the compressed bitstream) [Li: col. 8, line 51] comprising a plurality of compressed channels ((i.e. With respect to the classifier 918, IncomingFeature is 256 (as illustrated by the features maps 919, which is of size 4x4x256), the feature reduction parameter F is 2, and the threshold parameter is 32. As such, the classifier 918 reduces the number of channels according to the progression 256, 256/2, 256/22, 256/23, … and 1) [Li: col. 23, line 10-16]; (i.e. two chroma channels with a half resolution in each channel can be included) [Li: col. 29, line 56-58]; (i.e. generate any of the feature maps described herein) [Li: col. 18, line 4]); selecting (i.e. the process 800 determines (e.g., selects, calculates, chooses, etc.)) [Li: col. 13, line 61-62] one or more of the plurality of compressed channels ((i.e. FIG. 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual frames, for example, a frame 306. At the next level, the frame 306 can be divided into a series of segments 308 or planes. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, the frame 306 of color video data can include a luminance (or luma) plane and two chrominance (or chroma) planes) [Li: col. 7, line 38-42; Fig. 3]; (i.e. The pixels can include information representing an image captured in the frame, such as luminance information, color information, and location information. In in the memory 204. The computer software program can include machine instructions that, when executed by a processor, such as the CPU 202, cause the receiving station 106 to decode video data in the manners described below. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. 25 an example, a block, such as a 16x16-pixel block as shown, can include a luminance block 660, which can include luminance pixels 662; and two chrominance blocks 670/680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670/680 The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: can include chrominance pixels 690. For example, the luminance block 660 can include 16x16 luminance pixels 662, and each chrominance block 670/680 can include 8x8 chrominance pixels 690, as shown. Although one arrangement of blocks is shown, any arrangement can be used. Although FIG. 6 shows NxN blocks, in some implementations, NxM, where N;,M, blocks can be used. For example, 32x64 blocks, 64x32 blocks, 16x32 blocks, 32x16 blocks, or any other size blocks can be used.) [Li: col. 10, line 38-42; Fig. 6] ; (i.e. Referring again to FIG. 9, the branch 903-A convolves, with the block 902, 256 filters, each having a size 8x8. A stride that is equal to the size of the filters (i.e., a stride that is equal to 8) is used. As a result, 256 feature maps (i.e., the feature maps 904), each of size 8x8, are extracted) [Li: col. 18, line 34-38; Fig. 9]; (i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion process that determines what channels or bitstreams should be selected. This method is also an statistics approach.) from the compressed representation (i.e. information to decode the block may be entropy coded into block, frame, slice, and/or section headers within the compressed bitstream 420. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream; these terms will be used interchangeably herein.) [Li: col. 8, line 52-57]; and performing a learning task ((i.e. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process) [Li: col. 18, line 44-46]; (i.e. a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 41-42]; ((i.e. determining the rate and distortion for various partitioning modes to compare those modes and select a best mode, a machine-learning model can be used to estimate or infer the best mode) [Li: col. 4, line 39-42]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. In some examples, a feature compression rate can be applied to a machine-learning model to expand or reduce the number of features in the model. For example, the feature compression rate can be multiplied by all feature maps for feature expansion (or reduction)) [Li: col. 17, line 60-64]; (i.e. a metric can be computed for each of the examined combinations and the respective metrics compared. In an example, the metric can combine the rate and distortion described above to produce a rate-distortion (RD) value or cost. The RD value or cost may be a single scalar value. As mentioned, a best mode can be selected from many possible combinations.) [Li: col. 4, line 5-11]; (i.e. Given the above, if the value of the QP itself is used as an input to a machine-learning model, a disconnect may result between how the QP is used in evaluating the RD cost and how the QP is used in training machine-learning models. For codecs that use QP in the determination of RD cost, better performance can be achieved by using non-linear (e.g., exponential, quadratic, etc.) forms of the QPs as input to machine-learning models as compared to using linear (e.g., scalar) forms of the QPs. Better performance can mean smaller network size and/or better inference performance) [Li: col. 20, line 56-65] – Note: QP selection is a well-known technique, which is used during the rate-distortion analysis, which determines what channels or bitstreams should be selected)) on the one or more selected compressed channels ((i.e. same size. In machine learning, filter kernels (i.e., the real numbers which constitute the values of the kernels) can be learned in the training process. The branch 903-B extracts 256 feature maps (i.e., feature maps 908), each of size 8x8. The branch 903-B first extracts, at a first layer of the branch 903-B, feature maps 906 by convolving the block 902 with 128 filters, each of size 4x4, and using a stride of 4 (i.e., a stride that is equal to the filter size). At a second layer of the branch 903-B, each of the 128 feature maps of the feature maps 906 is convolved with two 2x2 filters, using a stride of 2, thereby resulting in the feature maps 908.) [Li: col. 18, line 44-55]; (i.e. Machine learning can be used to reduce the computational complexity in mode decisions) [Li: col. 14, line 36-37]).
In the same field of endeavor Zhang further discloses the claim limitations as follows:
selecting one or more of the plurality of compressed channels from the compressed representation ((i.e. extracted from the input picture are selected to provide an advantageous combination of features for high quality QP selection and resultant video coding. For example, features including a grid based combination of prediction distortion and picture variance provide along with target bitrate and picture resolution provide features that result in accurate QP selection with low computational requirements. Such techniques provide high accuracy QP prediction (e.g., about 95% accurate as compared to exhaustive QP searches) that are as accurate as multiple pass techniques. Furthermore, such techniques are highly accurate in scene change scenarios where no information correlation to previous frames is available.) [Zhang: col. 3, line 41-54]; (i.e. The training
pictures may be selected using any suitable technique or 55 techniques. For example, the training pictures may be selected to include a wide range of video picture scenarios) [Zhang: col. 10, line 53-56]); and performing a learning task on the one or more selected compressed channels ((i.e. The techniques discussed herein use such deep learning neural network training to automatically analyze the input pictures using the selected features and via the deep learning neural network to predict) [Zhang: col. 12, line 19-22]; (i.e. wherein the machine learning engine comprises a neural network trained using a training corpus mapping that maps a plurality of training video picture features and target bitrate combinations to corresponding quantization parameters) [Zhang: col. 28, line 32-36]). 
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Li with Zhang to program the processor to perform selection task.  
Therefore, the combination of Li with Zhang will enable for the system to achieve appropriate QP selection for high quality video coding with a low computational complexity [Zhang: col. 3, line 41-54]. 

Reference Notice 
Additional prior arts, included in the Notice of Reference Cited, made of record and not relied upon is considered pertinent to applicant's disclosure.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip Dang whose telephone number is (408) 918-7529.  The examiner can normally be reached on Monday-Thursday between 8:30 am - 5:00 pm (PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Philip P. Dang/Primary Examiner, Art Unit 2488