DETAILED ACTION
This Office Action is in response to the Applicants' communication filed on March 1, 2022, which amends the independent claims 1 and 9, amends the dependent claims 2, 8, 10, and 14-15, and presents arguments, is hereby acknowledged. Claims 1-18 are currently pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s arguments filed on March 1, 2022, have been fully considered. The amendments of the applicant are mainly grammatic, and no substantial limitation is added or removed. 
Applicant argues that by this response, the prior art, Fenney teaches that the compressor/decompressor include a single ALU, not a plurality of ALUs as cited in the independent claim 1:
“Respectfully, a compressor/decompressor unit that includes a single ALU is NOT the same as an execution unit that includes multiple ALUs-much less multiple ALUs that share a local memory. Indeed, nowhere does Fenney disclose or suggest that the compressor/decompressor unit includes multiple ALUs that share a local memory in the compressor/decompressor unit. To the contrary, Fenney describes that the "data storage" is attached to a single ALU.9 Thus, even assuming arguendo that Penney's data storage could somehow correspond to the "local shared memory" of claim 1-a proposition not accepted by Applicant, Fenney would still fail to disclose or suggest "the execution unit comprising arithmetic logic units (ALUs) and a local shared memory shared by all the ALUs in the execution unit ... , each tile of each tile group being processed by a plurality of the ALUs of the execution unit, wherein each ALU of [a] plurality of [the] ALUs process[es] the tile," as recited in claim 1”. 
Examiner replies that Fenney teaches that the color images may be divided into four channels (RGBW, see Fenney Fig. 3, and Fig. 6A is red channel histogram), and Fig. 4 of Fenney shows the processing diagram for one channel, and one channel has one ALU which is parallel with other channel ALUs. Thus, Fenney teaches that for the color data input, color data will be divided into RGBW four data, RGB components will pass the color conversion process and converted into YCmCn (different color space), four channels are used to process these four data, each channel has one ALU and there are four ALUs that processes each data (YCmCn-W) in parallel (See Fenney: [0081], “This unit, for performance reasons, performs operations on many channels and or pixels in parallel”). Therefore, Fenney teaches that there are four ALUs to process color data in parallel, or there are a plurality of ALUs involved. Thus, the argument of the applicant in this matter is not persuasive.
Applicant argues that by this response, the prior art, Fenney does not teach that the dedicated register space for use solely by the respective ALU as cited in the independent claim 1:
“Respectfully, each of a reformat unit and a display output unit is different than an ALU. Moreover, a register file that may be used by multiple devices (e.g., the reformat unit and the display output unit) is NOT the same as a dedicated register space for use solely by a particular ALU. Even assuming arguendo that Penney's ALU could somehow correspond to one of "the ALUs" of claim 1-a proposition not accepted by Applicant, nowhere does Fenney disclose or suggest that the register file is used solely by the ALU.  Accordingly, Applicant respectfully submits that Fenney also does not disclose or suggest "wherein each ALU includes dedicated register space for use solely by the respective ALU," as recited in claim 1”. 
Examiner replies that Fenney teaches that the color images may be divided into four channels (RGBW, see Fenney Fig. 3, and Fig. 6A is red channel histogram) with un-signed 8-bit unsigned integer data (UINT8), and Fig. 4 of Fenney shows the processing diagram for one channel, and each ALU is accessing the register file portion storing the 8-bit elements. Thus, Fenney teaches that the ALU is accessing the storage solely for the corresponding channel data. Therefore, the argument of the applicant in this matter is not persuasive.
Applicant argues that by this response, the prior art, Fenney does not teach that the dedicated register space for use solely by the respective ALU as cited in the independent claim 1:
“Moreover, because Fenney fails to disclose or suggest "wherein each ALU includes dedicated register space for use solely by the respective ALU," Fenney also cannot disclose or suggest "wherein each ALU of the plurality of ALUs processing the tile: ... stores all of the plurality of colour component values of the second colour space in the respective dedicated register space without storing the colour component values of the second colour space in the local shared memory," or "processes the colour component values of one of the colour component planes of the second colour space by performing a discrete wavelet transformation on the colour component values of the one colour component plane of the second colour space from the respective dedicated register space, while discarding the remaining colour component values stored in the respective dedicated register space, to produce wavelet coefficients stored in the respective dedicated register space without storing the wavelet coefficients in the local shared memory," much less "quantizes the wavelet coefficients from the respective dedicated register space to produce quantized wavelet coefficients stored in the respective dedicated register space without storing the quantized wavelet coefficients in the local shared memory," as claim 1 further recites”. 
Examiner replies that Fenney may not be literally teach that there is a dedicated register space for each ALU, but Fenney does teach that each of the four color data channels has corresponding data array stored in an associated register file, and the ALU for that channel accesses this register file that stores the channel data so that the ALUs for each channel can process each channel data in parallel. Thus, Fenney teaches that there is a unique memory portion for each ALU. Therefore, the argument of the applicant in this matter is not persuasive. Because all the above-mentioned arguments of the applicant are not persuasive, the remaining arguments of the applicant, which are related to the dependent claims and have the same or similar points in the applicant’s arguments, are not persuasive either.
Examiner respectfully further replies that the Applicant's arguments have been fully considered and the arguments are not persuasive. The present action is made final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-12 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Fenney, etc. (US 20090046937 A1) in view of Schwartz, etc. (US 20030215146 A1).
Regarding claim 1, Fenney teaches that a method of processing image data for transmittal to a display device, the method (See Fenney: Fig. 1, and [0073], "FIG. 1 is a schematic illustration of a graphics rendering system in an electronic device. A CPU 1 communicates with a graphics rendering subsystem 2, which accesses buffer data, such as framebuffer and geometry information stored in a graphics memory 3. Rendered images stored in the memory may then be transferred to an output device 4, such as a display") comprising:
receiving a frame of image data, wherein the frame of image data is divided into a plurality of tile groups, each tile group composed of a plurality of tiles, wherein each tile comprises a plurality of pixels, each pixel having a plurality of colour component values of a first colour space, wherein each tile includes a plurality of colour component planes of the first colour space having the respective colour component values for the pixels forming the tile (See Fenney: Fig. 1, and [0074], "In a rendering system in accordance with the invention, a channel of image data, which can be thought of as a two-dimensional data set, is divided into a series of arrays or tiles. Each tile is compressed individually. The size of the tiles may be arbitrarily chosen, but  power of 2 dimensions are preferable, and need not be uniform. The size of the tiles chosen represents a balance between improving compression rate (large tiles) and lower latency and smaller internal data storage size (small tiles). It has been found that a tile size of around 32.times.32 pixels is a good compromise for most practical applications. The image data can be set of scalar values and/or vectors"; and Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform");
storing the received frame of image data in an input buffer (See Fenney: Figs. 2-4, and [0081], "A compressor/decompressor unit 20 that can perform these tasks is shown schematically in FIG. 4. The process steps of the unit are controlled by a state machine 200 that coordinates all operations inside the compression/decompression unit. Referring to the compression process, the output of the renderer 10 is fed to a reformatting unit 205 that performs trivial demultiplexing of the data. This corresponds to parts of step 120 of FIG. 3. The reformatted data is then held in a storage area 210 that preferably consists of a register file that provides parallel access to many data elements every clock cycle. Attached to the data store is a 'parallel' arithmetic logic unit (ALU) 220 that performs the operations needed for the colour transform, corresponding to  step 125 of FIG. 3, and the  wavelet  transform corresponding to step 130 of FIG. 3. This unit, for performance reasons, performs operations on many channels and or pixels in parallel"); and 
processing each tile group in an execution unit, the execution unit comprising arithmetic logic units (ALUs) and a local shared memory shared by all the ALUs in the execution unit, wherein each ALU includes dedicated register space for use solely by the respective ALU, each tile of each tile group being processed by a plurality of the ALUs of the execution unit, wherein each ALU of the plurality of ALUs processing the tile (See Fenney: Figs. 2-4, and [0081], "A compressor/decompressor unit 20 that can perform these tasks is shown schematically in FIG. 4. The process steps of the unit are controlled by a state machine 200 that coordinates all operations inside the compression/decompression unit. Referring to the compression process, the output of the renderer 10 is fed to a reformatting unit 205 that performs trivial demultiplexing of the data. This corresponds to parts of step 120 of FIG. 3. The reformatted data is then held in a storage area 210 that preferably consists of a register file that provides parallel access to many data elements every clock cycle. Attached to the data store is a 'parallel' arithmetic logic unit (ALU) 220 that performs the operations needed for the colour transform, corresponding to step 125 of FIG. 3, and the wavelet transform corresponding to step 130 of FIG. 3. This unit, for performance reasons, performs operations on many channels and or pixels in parallel"): 
performs a reversible colour transformation on the colour component values of the plurality of colour component planes in the first colour space of the tile to transform the colour component values in the plurality of colour component planes of the first colour space into colour component values in a plurality of colour component planes of a second colour space and stores all of the plurality of colour component values of the second colour space in the respective dedicated register space without storing the colour component values of the second colour space in the local shared memory (See Fenney: Fig. 1, and [0073], "FIG. 1 is a schematic illustration of a graphics rendering system in an electronic device. A CPU 1 communicates with a graphics rendering subsystem 2, which accesses buffer data, such as framebuffer and geometry information stored in a graphics memory 3. Rendered images stored in the memory may then be transferred to an output device 4, such as a display");
processes the colour component values of one of the colour component planes of the second colour space by performing a discrete wavelet transformation on the colour component values of the one colour component plane of the second colour space from the respective dedicated register space, while discarding the remaining colour component values stored in the respective dedicated register space, to produce wavelet coefficients stored in the respective dedicated register space without storing the wavelet coefficients in the local shared memory (See Fenney: Fig. 3, and [0119], "Another known colour model is the Reversible Colour Transform (RCT) of JPEG2000 described in "A Reversible Color Transform for 16-bit-color Picture Coding", Li et al, Proceedings of the 12.sup.th annual ACM international conference on Multimedia, 2004, the contents of which are incorporated  herein by reference"; and [0076], "FIG. 3 is a schematic illustration of a lossless compression method to be performed by the compressor/decompressor unit in accordance with the present invention. FIG. 3 illustrates the compression method for different data formats, with different data formats shown in separate columns, (a), (b), (c) and (d). The method steps for data compression are illustrated in order from the top of the figure to the bottom. For data decompression, the steps are simply reversed and so are shown from the bottom of the page to the top"); and
quantizes the wavelet coefficients from the respective dedicated register space to produce quantized wavelet coefficients stored in the respective dedicated register space without storing the quantized wavelet coefficients in the local shared memory (See Fenney: Fig. 1, and [0152], "To further increase the compression ratio, the technique can be extended to support a lossy compression option. This can be achieved by zeroing less significant bits in the wavelet coefficients prior to encoding with a modified encoding scheme. The smaller number of possible values thus allows a greater compression ratio. More drastically, some of the higher frequency terms could be quantised to zero, which will also have a similar effect. However, the wavelet 'update' step, which is not used in the lossless version, may be required to maintain quality");
wherein the quantized wavelet coefficients for all the colour component planes of the second colour space for each tile are then entropy encoded into variable length codes and the variable length codes for all the tiles of the tile group are assembled together for transmittal to a display device (See Fenney: Fig. 3, and [0080], "To further increase the compression ratio, the technique can be extended to support a lossy compression option. This can be achieved by zeroing less significant bits in the wavelet coefficients prior to encoding with a modified encoding scheme. The smaller number of possible values thus allows a greater compression ratio. More drastically, some of the higher frequency terms could be quantised to zero, which will also have a similar effect. However, the wavelet 'update' step, which is not used in the lossless version, may be required to maintain quality"; and [0082], "Also connected to the data store, 210, is the entropy encoding/decoding unit 230, which performs entropy encoding on the coefficients that result from the colour and wavelet transformations. This corresponds to step 140 of FIG. 3. The variable length encoded data is written to a data buffer 240 before being transferred to memory via the memory interface 15").
However, Fenney fails to explicitly disclose that the colour component values of the plurality of colour component planes in the first colour space of the tile.
However, Schwartz teaches that the colour component values of the plurality of colour component planes in the first colour space of the tile (See Schwartz: Figs. 17-21, and [0219], "Layer 0 contains all data not quantized away by quantizer 0. This would be luminance data only: all of 3LL; all but 4 bitplanes of 2HL, 2LH, 3HL, 3LH and 3HH; all but 5 bitplanes of 2HH and all but 6 bitplanes of 1HL and 1LH. Layer 1 contains all data not in layer 0 and not quantized away by quantizer 1. This would be luminance bitplanes 5 for 1HL and 1LH, bitplane 4 for 2 HH, bitplane 3 for 3HL and 3LH; all 3LL chrominance; all but 5 bitplanes for chrominance 3HL and 31H; and all but 6 bitplanes for chrominance 2HL, 2LH and 3HH. Finally, layer 15 would contain the LSB of 1LH chrominance").
Therefore, it would have been obvious to  one of ordinary skill in the art at the time of the invention was effectively filed to modify Fenney to have the colour component values of the plurality of colour component planes in the first colour space of the tile as taught by Schwartz in order to allow the buffer to store a fraction of the coded data, the first data can be output (transmitted or stored) sooner, and the second pass through the data can be faster because there is less data to process (See Schwartz: Figs. 15A-B, and [0226], "The above process is advantageous in that it allows the buffer to store a fraction of the coded data, the first data can be output (transmitted or stored) sooner, and the second pass through the data can be faster because there is less data to process. Also less memory is required for buffering"). Fenney teaches a method and system that may convert the color data into different color space, divide the image data into sets of data arrays, perform wavelet transform on the data arrays to produce wavelet coefficients, encoding the wavelet coefficients, and transmit the entropy encoded wavelet coefficients to the user device that reverses the encoding process to render the original images, and Schwartz teaches a system and method that may quantize and compress different color plane (bitplanes) with different compression rate in order to produce sharp images and save memory usages. Therefore, it is obvious to one of ordinary skill in the art to modify Fenney by Schwartz to process color images differently according to the color plane. The motivation to modify Fenney by Schwartz is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 2, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney teaches that the method of claim 1, wherein the first colour space is a red, green, and blue (RGB) colour space (See Fenney: Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform").
Regarding claim 3, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney teaches that the method of claim 1, wherein the second colour space is a luminance-chrominance colour space (See Fenney: Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform").
Regarding claim 4, Fenney and Schwartz teach all the features with respect to claim 3 as outlined above. Further, Fenney teaches that the method of claim 3, wherein the second colour space is a YUV colour space (See Fenney: Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform". Note that the color spaces YUV, YCbCr, YPbPr, etc., are sometimes ambiguous, and inter-exchangeable).
Regarding claim 5, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney teaches that the method of claim 1, wherein the discrete wavelet transformation comprises a Haar transform (See Fenney: Fig. 13, and [0149], "The Haar wavelet is beneficial for the MSAA raw data as, within each over-sampled pixel, there is a very high probability of replicated colours. Because a linear wavelet uses both neighbouring even samples for the prediction, it would thus predict a linear blend across the MSAA data's 2.times.2 block's borders. This means odd samples in the first wavelet level are usually predicted with less error when using a Haar instead of a linear wavelet. This is illustrated in FIG. 13, which shows a pair of 2.times.2 multi-sampled pixels side by side. Both pixels are crossed by the same two triangles, 1100 and 1101. The former varies from light to dark green due to shading or texturing but, as MSAA is being performed, the colour for each triangle is evaluated only once per pixel and replicated to all samples in the pixel that are covered by the triangle. For the purposes of illustration, we just consider just the application of the wavelet to the first row of data").
Regarding claim 6, Fenney and Schwartz teach all the features with respect to claim 5 as outlined above. Further, Fenney teaches that the method of claim 5, wherein the Haar transformation is repeated a plurality of times during the processing to produce the wavelet coefficients (See Fenney: Fig. 13, and [0151], "For comparison, the right hand side of FIG. 13 uses a linear wavelet. The same predicted element, 1120, this time obtains its prediction from the average of two different even samples, 1125 and 1126, which means that it averages light green from 1125, and dark green from 1126, to produce a mid-intensity green predicted result, 1130. This means that a non-zero wavelet coefficient will be stored requiring more bits to encode. As MSAA typically has repeated sample values inside each pixel, prediction using values in a neighbouring pixel is more likely to be counterproductive"; and [0149], "The Haar wavelet is beneficial for the MSAA raw data as, within each over-sampled pixel, there is a very high probability of replicated colours. Because a linear wavelet uses both neighbouring even samples for the prediction, it would thus predict a linear blend across the MSAA data's 2.times.2 block's borders. This means odd samples in the first wavelet level are usually predicted with less error when using a Haar instead of a linear wavelet. This is illustrated in FIG. 13, which shows a pair of 2.times.2 multi-sampled pixels side by side. Both pixels are crossed by the same two triangles, 1100 and 1101. The former varies from light to dark green due to shading or texturing but, as MSAA is being performed, the colour for each triangle is evaluated only once per pixel and replicated to all samples in the pixel that are covered by the triangle. For the purposes of illustration, we just consider just the application of the wavelet to the first row of data").
Regarding claim 7, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney teaches that the method of claim 1, wherein at least part of the entropy encoding is performed by each ALU on the quantized wavelet coefficients and the variable length codes are stored in the local shared memory (See Fenney: Figs. 2-4, and [0082], "Also connected to the data store, 210, is the entropy encoding/decoding unit 230, which performs entropy encoding on the coefficients that result from the colour and wavelet transformations. This corresponds to step 140 of FIG. 3. The variable length encoded data is written to a data buffer 240 before being transferred to memory via the memory interface 15").
Regarding claim 8, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney teaches that the method of claim 1, wherein the method is performed at a Graphics Processing Unit (See Fenney: Figs. 1-2, and [0075], "FIG. 2 shows the rendering system of a preferred embodiment of the invention in more detail. The graphics rendering system 2 includes a renderer 10, a compressor/decompressor unit 20, memory interface 15 and a display output unit 30. The renderer 10 performs scan conversion of graphics primitives, such as triangles and lines, using known techniques such as Z-tests and texture mapping. The renderer operates in a tile-based order. The renderer may contain cache units to reduce memory traffic. Some data is read or written by the renderer, directly to the graphics memory 3 via the memory interface unit 15 but for other data, such as the framebuffer, it preferably goes via the compressor/decompressor unit 20. This unit reduces the amount of data that needs to be transferred across the external memory bus. The compressor/decompressor unit has access to the memory interface unit 15. The display output unit 30 sends completed image data to the display device 4. This may be an uncompressed image, in which case it is accessed directly from the memory interface unit 15 or it may be compressed data, in which case it will be accessed via the compressor/decompressor 20. Although shown as a single entity, the compressor/decompressor unit may contain multiple parallel compression/decompression units for enhanced performance reasons").
Regarding claim 9, Fenney and Schwartz teach all the features with respect to claim 1 as outlined above. Further, Fenney and Schwartz teach that a Graphics Processing Unit (GPU) configured to process image data for transmittal to a display device (See Fenney: Fig. 1, and [0073], "FIG. 1 is a schematic illustration of a graphics rendering system in an electronic device. A CPU 1 communicates with a graphics rendering subsystem 2, which accesses buffer data, such as framebuffer and geometry information stored in a graphics memory 3. Rendered images stored in the memory may then be transferred to an output device 4, such as a display"), by:
receiving a frame of image data, wherein the frame of image data is divided into a plurality of tile groups, each tile group composed of a plurality of tiles, wherein each tile comprises a plurality of pixels, each pixel having a plurality of colour component values of a first colour space, wherein each tile includes a plurality of colour component planes of the first colour space having the respective colour component values for the pixels forming the tile (See Fenney: Fig. 1, and [0074], "In a rendering system in accordance with the invention, a channel of image data, which can be thought of as a two-dimensional data set, is divided into a series of arrays or tiles. Each tile is compressed individually. The size of the tiles may be arbitrarily chosen, but  power of 2 dimensions are preferable, and need not be uniform. The size of the tiles chosen represents a balance between improving compression rate (large tiles) and lower latency and smaller internal data storage size (small tiles). It has been found that a tile size of around 32.times.32 pixels is a good compromise for most practical applications. The image data can be set of scalar values and/or vectors"; and Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform");
storing the received frame of image data in an input buffer (See Fenney: Figs. 2-4, and [0081], "A compressor/decompressor unit 20 that can perform these tasks is shown schematically in FIG. 4. The process steps of the unit are controlled by a state machine 200 that coordinates all operations inside the compression/decompression unit. Referring to the compression process, the output of the renderer 10 is fed to a reformatting unit 205 that performs trivial demultiplexing of the data. This corresponds to parts of step 120 of FIG. 3. The reformatted data is then held in a storage area 210 that preferably consists of a register file that provides parallel access to many data elements every clock cycle. Attached to the data store is a 'parallel' arithmetic logic unit (ALU) 220 that performs the operations needed for the colour transform, corresponding to  step 125 of FIG. 3, and the  wavelet  transform corresponding to step 130 of FIG. 3. This unit, for performance reasons, performs operations on many channels and or pixels in parallel"); and 
processing each tile group in an execution unit, the execution unit comprising arithmetic logic units (ALUs) and a local shared memory shared by all the ALUs in the execution unit, wherein each ALU includes dedicated register space for use solely by the respective ALU, each tile of each tile group being processed by a plurality of the ALUs of the execution unit, wherein each ALU of the plurality of ALUs processing the tile (See Fenney: Figs. 2-4, and [0081], "A compressor/decompressor unit 20 that can perform these tasks is shown schematically in FIG. 4. The process steps of the unit are controlled by a state machine 200 that coordinates all operations inside the compression/decompression unit. Referring to the compression process, the output of the renderer 10 is fed to a reformatting unit 205 that performs trivial demultiplexing of the data. This corresponds to parts of step 120 of FIG. 3. The reformatted data is then held in a storage area 210 that preferably consists of a register file that provides parallel access to many data elements every clock cycle. Attached to the data store is a 'parallel' arithmetic logic unit (ALU) 220 that performs the operations needed for the colour transform, corresponding to step 125 of FIG. 3, and the wavelet transform corresponding to step 130 of FIG. 3. This unit, for performance reasons, performs operations on many channels and or pixels in parallel"):
performs a reversible colour transformation on the colour component values of the plurality of colour component planes in the first colour space of the tile to transform the colour component values in the plurality of colour component planes of the first colour space into colour component values in a plurality of colour component planes of a second colour space (See Schwartz: Figs. 17-21, and [0219], "Layer 0 contains all data not quantized away by quantizer 0. This would be luminance data only: all of 3LL; all but 4 bitplanes of 2HL, 2LH, 3HL, 3LH and 3HH; all but 5 bitplanes of 2HH and all but 6 bitplanes of 1HL and 1LH. Layer 1 contains all data not in layer 0 and not  quantized away by quantizer 1. This would be luminance bitplanes 5 for 1HL and 1LH, bitplane 4 for 2 HH, bitplane 3 for 3HL and 3LH; all 3LL chrominance; all but 5 bitplanes for chrominance 3HL and 31H; and all but 6 bitplanes for chrominance 2HL, 2LH and 3HH. Finally, layer 15 would contain the LSB of 1LH chrominance") and stores all of the plurality of colour component values of the second colour space in the respective dedicated register space without storing the colour component values of the second colour space in the local shared memory (See Fenney: Fig. 1, and [0073], "FIG. 1 is a schematic illustration of a graphics rendering system in an electronic device. A CPU 1 communicates with a graphics rendering subsystem 2, which accesses buffer data, such as framebuffer and geometry information stored in a graphics memory 3. Rendered images stored in the memory may then be transferred to an output device 4, such as a display");
processes the colour component values of one of the colour component planes of the second colour space by performing a discrete wavelet transformation on the colour component values of the one colour component plane of the second colour space from the respective dedicated register space, while discarding the remaining colour component values stored in the respective dedicated register space, to produce wavelet coefficients stored in the respective dedicated register space without storing the wavelet coefficients in the local shared memory (See Fenney: Fig. 3, and [0119], "Another known colour model is the Reversible Colour Transform (RCT) of JPEG2000 described in "A Reversible Color Transform for 16-bit-color Picture Coding", Li et al, Proceedings of the 12.sup.th annual ACM international conference on Multimedia, 2004, the contents of which are incorporated  herein by reference"; and [0076], "FIG. 3 is a schematic illustration of a lossless compression method to be performed by the compressor/decompressor unit in accordance with the present invention. FIG. 3 illustrates the compression method for different data formats, with different data formats shown in separate columns, (a), (b), (c) and (d). The method steps for data compression are illustrated in order from the top of the figure to the bottom. For data decompression, the steps are simply reversed and so are shown from the bottom of the page to the top"); and 
quantizes the wavelet coefficients from the respective dedicated register space to produce quantized wavelet coefficients stored in the respective dedicated register space without storing the quantized wavelet coefficients in the local shared memory (See Fenney: Fig. 1, and [0152], "To further increase the compression ratio, the technique can be extended to support a lossy compression option. This can be achieved by zeroing less significant bits in the wavelet coefficients prior to encoding with a modified encoding scheme. The smaller number of possible values thus allows a greater compression ratio. More drastically, some of the higher frequency terms could be quantised to zero, which will also have a similar effect. However, the wavelet 'update' step, which is not used in the lossless version, may be required to maintain quality"),
wherein the quantized wavelet coefficients for all the colour component planes of the second colour space for each tile are then entropy encoded into variable length codes and the variable length codes for all the tiles of the tile group are assembled together for transmittal to a display device (See Fenney: Fig. 3, and [0080], "To further increase the compression ratio, the technique can be extended to support a lossy compression option. This can be achieved by zeroing less significant bits in the wavelet coefficients prior to encoding with a modified encoding scheme. The smaller number of possible values thus allows a greater compression ratio. More drastically, some of the higher frequency terms could be quantised to  zero, which will also have a similar effect. However, the wavelet 'update' step, which is not used in the lossless version, may be required to maintain quality"; and [0082], "Also connected to the data store, 210, is the entropy encoding/decoding unit 230, which performs entropy encoding on the coefficients that result from the colour and wavelet transformations. This corresponds to step 140 of FIG. 3. The variable length encoded data is written to a data buffer 240 before being transferred to memory via the memory interface 15").
Regarding claim 10, Fenney and Schwartz teach all the features with respect to claim 9 as outlined above. Further, Fenney teaches that a host device comprising the GPU according to of claim 9, a central processing unit (CPU) and an output transport mechanism (See Fenney: Fig. 2, and [0075], "FIG. 2 shows the rendering system of a preferred embodiment of the invention in more detail. The graphics rendering system 2 includes a renderer 10, a compressor/decompressor unit 20, memory interface 15 and a display output unit 30. The renderer 10 performs scan conversion of graphics primitives, such as triangles and lines, using known techniques such as Z-tests and texture mapping. The renderer operates in a tile-based order. The renderer may contain cache units to reduce memory traffic. Some data is read or written by the renderer, directly to the graphics memory 3 via the memory interface unit 15 but  for  other data, such as the framebuffer, it preferably goes via the compressor/decompressor unit 20. This unit reduces the amount of data that needs to be transferred across the external memory bus. The compressor/decompressor unit has access to the memory interface unit 15. The display output unit 30 sends completed image data to the display device 4. This may be an uncompressed image, in which case it is accessed directly from the memory interface unit 15 or it may be compressed data, in which case it will be accessed via the compressor/decompressor 20. Although shown as a single entity, the compressor/decompressor unit may contain multiple parallel compression/decompression units for enhanced performance reasons").
Regarding claim 11, Fenney and Schwartz teach all the features with respect to claim 10 as outlined above. Further, Fenney teaches that a system for managing display data, the system comprising the host device of claim 10 and a display device coupled to the host device by a bandwidth limited transmission medium, wherein the display device comprises:
a receiver configured to receive the variable length codes for  all the tiles of the tile group via the bandwidth limited transmission medium from the host device (See Fenney: Figs. 2-4, and [0082], "Also connected to the data store, 210, is the entropy encoding/decoding unit 230, which performs entropy encoding on the coefficients that result from the colour and wavelet transformations. This corresponds to step 140 of FIG. 3. The variable length encoded data is written to a data buffer 240 before being transferred to memory via the memory interface 15"; and [0117], "Due to the variable lengths of the encoded coefficients, the process of decoding of the coefficients is generally sequential. Although it can be possible to decode pairs of, say, the coefficients encoded with the shorter bit strings, it is extremely difficult to increase the number that can be done within one clock cycle. In an alternative embodiment, at least one additional parallel entropy decoder is included. The first portion of the symbols is decoded with the first entropy decoder and the second portion with the second decoder. So that the second decoder knows where the encoded symbols begin, a bit offset is also stored at the start of the compressed data. Such an offset does reduce the compression efficiency and so such an embodiment trades decode speed against compression ratio. Note that the size of the offset is bounded as the data is never expanded. Further decoders and offsets could be used to further increase speed in alternate embodiments");
a decoder configured to decode the variable length codes for all the tiles of the tile group to form the image data (See Fenney: Figs. 2-5, and [0107], "For the de-compressor to be able to decode a particular tile of data, it needs to know the encoding scheme used for that tile. The first two bits, therefore, of the encoded tile data are used to store the encoding scheme as follows"); and
an output configured to output the image data for display (See Fenney: Fig. 2, and [0075], "The display output unit 30 sends completed image data to the display device 4. This may be an uncompressed image, in which case it is accessed directly from the memory interface unit 15 or it may be compressed data, in which case it will be accessed via the compressor/decompressor 20. Although shown as a single entity, the compressor/decompressor unit may contain multiple parallel compression/decompression units for enhanced performance reasons").
Regarding claim 12, Fenney and Schwartz teach all the features with respect to claim 11 as outlined above. Further, Fenney teaches that the system according of claim 11, wherein the display device comprises a pair of display panels for displaying the frame of image data, wherein the display device and the pair of display panels are incorporated in a wearable headset (See Fenney: Fig. 1, and [0002], "Data compression, both lossless and lossy, is highly desirable in many applications including graphics frame buffers where large amounts of data are to be stored and read out. With the increasing use of graphics in small devices such as mobile communication devices the use of compressed data is therefore highly desirable").
Regarding claim 14, Fenney and Schwartz teach all the features with respect to claim 11 as outlined above. Further, Fenney teaches that the system of claim 11, wherein the first colour space is a red, green, and blue (RGB) colour space (See Fenney: Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform").
Regarding claim 15, Fenney and Schwartz teach all the features with respect to claim 11 as outlined above. Further, Fenney teaches that the system of claim 11, wherein the second colour space is a luminance- chrominance colour space (See Fenney: Fig. 3, and [0078], "Column (d) illustrates the method for RGB(A) colour data. RGB(A) colour data may advantageously be transformed into a different colour space, in this example, a YCmCn colour space, or transformed using the JPEG-LS transformation, as will be described in further detail. This transform step 125 is shown as preceding the wavelet transform step but may be performed subsequent to the wavelet transform step. The different channels of colour data are 8-bit unsigned integer data channels. If an alpha channel, A, is included in the RGB(A) data, it is encoded independently of the colour transform". Note that the color spaces YUV, YCbCr, YPbPr, etc., are sometimes ambiguous, and inter-exchangeable).
Regarding claim 16, Fenney and Schwartz teach all the features with respect to claim 11 as outlined above. Further, Fenney teaches that the system of claim 11, wherein the discrete wavelet transformation comprises a Haar transform (See Fenney: Fig. 13, and [0149], "The Haar wavelet is beneficial for the MSAA raw data as, within each over-sampled pixel, there is a very high probability of replicated colours. Because a linear wavelet uses both neighbouring even samples for the prediction, it would thus predict a linear blend across the MSAA data's 2.times.2 block's borders. This means odd samples in the first wavelet level are usually predicted with less error when using a Haar instead of a linear wavelet. This is illustrated in FIG. 13, which shows a pair of 2.times.2 multi-sampled pixels side by side. Both pixels are crossed by the same two triangles, 1100 and 1101. The former varies from light to dark green due to shading or texturing but, as MSAA is being performed, the colour for each triangle is evaluated only once per pixel and replicated to all samples in the pixel that are covered by the triangle. For the purposes of illustration, we just consider just the application of the wavelet to the first row of data").
Regarding claim 17, Fenney and Schwartz teach all the features with respect to claim 16 as outlined above. Further, Fenney teaches that the system of claim 16, wherein the Haar transformation is repeated a plurality of times during the processing to produce the wavelet coefficients (See Fenney: Fig. 13, and [0151], "For comparison, the right hand side of FIG. 13 uses a linear wavelet. The same predicted element, 1120, this time obtains its prediction from the average of two different even samples, 1125 and 1126, which means that it averages light green from 1125, and dark green from 1126, to produce a mid-intensity green predicted result, 1130. This means that a non-zero wavelet coefficient will be stored requiring more bits to encode. As MSAA typically has repeated sample values inside each pixel, prediction using values in a neighbouring pixel is more likely to be counterproductive"; and [0149], "The Haar wavelet is beneficial for the MSAA raw data as, within each over-sampled pixel, there is a very high probability of replicated colours. Because a linear wavelet uses both neighbouring even samples for the prediction, it would thus predict a linear blend across the MSAA data's 2.times.2 block's borders. This means odd samples in the first wavelet level are usually predicted with less error when using a Haar instead of a linear wavelet. This is illustrated in FIG. 13, which shows a pair of 2.times.2 multi-sampled pixels side by side. Both pixels are crossed by the same two triangles, 1100 and 1101. The former varies from light to dark green due to shading or texturing but, as MSAA is being performed, the colour for each triangle is evaluated only once per pixel and replicated to all samples in the pixel that are covered by the triangle. For the purposes of illustration, we just consider just the application of the wavelet to the first row of data").
Regarding claim 18, Fenney and Schwartz teach all the features with respect to claim 11 as outlined above. Further, Fenney teaches that the system of claim 11, wherein at least part of the entropy encoding is performed by each ALU on the quantized wavelet coefficients and the variable length codes are stored in the local shared memory (See Fenney: Figs. 2-4, and [0082], "Also connected to the data store, 210, is the entropy encoding/decoding unit 230, which performs entropy encoding on the coefficients that result from the colour and wavelet transformations. This corresponds to step 140 of FIG. 3. The variable length encoded data is written to a data buffer 240 before being transferred to memory via the memory interface 15").

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Fenney, etc. (US 20090046937 A1) in view of Schwartz, etc. (US 20030215146 A1), further in view of Tran, etc. (US 20170323481 A1).
Regarding claim 13, Fenney and Schwartz teach all the features with respect to claim 12 as outlined above. However, Fenney fails to explicitly disclose that the system of claim 12, wherein the wearable headset comprises a virtual reality or an augmented reality headset.
However, Tran teaches that the system of claim 12, wherein the wearable headset comprises a virtual reality or an augmented reality headset (See Tran: Figs. 3-4, and [0176], "In an example, a device and system for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can take pictures of food using an imaging device selected from the group consisting of: smart glasses, visor, or other eyewear; electronically-functional glasses, visor, or other eyewear; augmented reality glasses, visor, or other eyewear; virtual reality glasses, visor, or other eyewear; and electronically-functional contact lens. In an example, a device and system for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can take pictures of food using an imaging device selected from the group consisting of: smart utensil, fork, spoon, food probe, plate, dish, or glass; and electronically-functional utensil, fork, spoon, food probe, plate, dish, or glass. In an example, a device and system for measuring a person's consumption of at least one selected type of food, ingredient, or nutrient can take pictures of food using an imaging device selected from the group consisting of: smart necklace, smart beads, smart button, neck chain, and neck pendant"). 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Fenney to have the system of claim 12, wherein the wearable headset comprises a virtual reality or an augmented reality headset as taught by Tran in order to provide a more realistic augmented reality experience for users of the augmented reality device, as painted objects will appear as painted regardless of the angle from which the user views the objects using the augmented reality device (See Tran: Figs. 4A-C, and [0220], "Additionally, the painted surface may be partially or completely occluded by other objects when viewed from other perspectives, but may still remain painted in the specified fashion when once again viewed from a perspective where all or part of the surface can be seen. Advantageously, doing so provides a more realistic augmented reality experience for users of the augmented reality device, as painted objects will appear as painted regardless of the angle from which the user views the objects using the augmented reality device"). Fenney teaches a method and system that may convert the color data into different color space, divide the image data into sets of data arrays, perform wavelet transform on the data arrays to produce wavelet coefficients, encoding the wavelet coefficients, and transmit the entropy encoded wavelet coefficients to the user device that reverses the encoding process to  render the original images and outputs the rendered images for display, and Tran teaches a system and method that may insert an image of the objects into the environment and interactively change the  objects, its installation positions, etc. with virtual content displayed to  the users. Therefore, it is obvious to one of ordinary skill in the art to modify Fenney by Tran to display the output images to augmented displays. The motivation to modify Fenney by Tran is "Simple substitution of one known element for another to obtain predictable results".





Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2612