Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 8, 10, 12-14, 15, 17, 19-21, and 24-26 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Malik (U.S. Patent 6438556 B1)
Claim 1
Malik discloses a method of transforming a data file, the method executed by a processor and comprising:
segmenting the data file into data segments (col 6, line 20-21, “... the source data file 220 has been separated into three segments 212, 214, and 216. Each segment 212, 214, and 216 includes ten subsegments 218...” <examiner note: file 220 is segmented>)	;
creating a bit index (string of bits 242, 244, and 246) for each data segment (segments 212, 214, and 216) having a size that is based on a configurable or preset data group unit (col 6, line 22-23, “... each bit string 242, 244, and 246 has size/length of 10 based on number of number of subsegments/configurable/preset date group of unit) (col 6, line 53-55, “... The compressed file 240 includes a string of bits representing each of the segments 212, 214, and 216 of the source data file 210...” <examiner note: strings of bits 242, 244, and 246 are considered as bit index. Each string of bit is for each segment 212, 214, and 216. Each string of bit has a size/length based on the number of subsegments that are configurable based on segmenting or number of segments that is data group units);
indexing each data segment into its corresponding bit index by reading all data group unit values within the data segment (col 6, line 55-58, “... The bits are the code words for each subsegment 218. As depicted in FIG. 5, the bits represent the numbers zero through three, indicating the code words 222, 226, 230, and 324...” <examiner note: each subsegment/data group unit within each segment 212, 214, and 216/data segment is read>) and updating the bit index based on the read values by extracting all data group unit values from the data segment, and updating the bit index to register each unique data group unit value identified, wherein each unique data group unit value has a corresponding representative bit in the bit index, and the value of the representative bit represents whether the data group unit value is present or not in the data segment (col 6, line 55-58, “... The bits are the code words for each subsegment 218. As depicted in FIG. 5, the bits represent the numbers zero through three, indicating the code words 222, 226, 230, and 324...” <examiner note: When a subsegment is read, its value is compared to the dictionary to obtain the code word. and the position of string of bit/bit index is updated with the code word correspond to the position of the subsegment within the segment. For instance, the 10th subsegment of segment 216 is read. The value of the segment is 1101001011 with code word = 3, and position is 10th. The string of bit/bit index 246 is set with code word 3 at position 10th>), and wherein the offset position of each bit in the bit index corresponds or is associated to the unique data group unit value that the bit represents (col 6, line 10-12, “... although the code words 222, 226, 230, and 234 are represented in base 10 in fig. 5, each code word is two bits long. Thus, the dictionary 220 includes code words 222, 226, 230, and 234 each having the same length, two bits...” <examiner note: each code word in string of bits 242, 244, and 246 is a two bits long that are associated or correspond to each group unit value. For instance, code word 0 (i.e., a two bit long) at first position of first bit string 242 associates with value 1011011101 of the first row, first column of data segment 212); and 


    PNG
    media_image1.png
    616
    613
    media_image1.png
    Greyscale

generating an output data file (a compressed file 24) or files (multiple compressed files for each segment) comprising the bit indexes that represent the original data file. (col 53, line 53-55, “... The compressed file 240 includes a string of bits representing each of the segments 212, 214, and 216 of the source data file 210...” col 5, line 25-27, “... In a preferred embodiment, step 106 includes generating a compressed file by providing a code word corresponding to each of the segments...” <examiner note: the compressed file 240 comprised string of bit 242, 244, and 246 or multiple compressed files are for each string of bit 242, 244, and 246>)
Claim 8
Claim 1 is included, Malik discloses wherein the data group unit is a byte-group value comprising a designated number of bytes (col 6, line 22-23, “... each bit string 242, 244, and 246 has size/length of 10 based on number of number of subsegments/configurable/preset date group of unit) 
Claim 10
Claim 8 is included; Malik discloses wherein each data group unit comprises groups of consecutive bytes, or wherein each data group unit comprises groups of non-consecutive bytes (fig. 5, each subsegment and segment are groups consecutive bytes)
Claim 12
Claims 8 is included; Malik discloses wherein the data group unit is set as 2-byte values. (col 6, line 22-23, “... each bit string 242, 244, and 246 has size/length of 10 based on number of number of subsegments/configurable/preset date group of unit) 

Claims 8 is included; Malik discloses wherein the size of the bit index for each data segment is a function of the number of bytes that define each data group unit (<examiner note: the size of string of bit 242 is based on the number of subsegment/data group unit in segment 212>)
Claim 14
Claims 8 is include; Malik discloses wherein the size of the bit index for each data segment corresponds to or is calculated as 256n bits, where n corresponds to the number of bytes in each data group unit (col 6, line 55-58, “... The bits are the code words for each subsegment 218. As depicted in FIG. 5, the bits represent the numbers zero through three, indicating the code words 222, 226, 230, and 324...” <examiner note: When a subsegment is read, its value is compared to the dictionary to obtain the code word. and the position of string of bit/bit index is updated with the code word correspond to the position of the subsegment within the segment. For instance, the 10th subsegment of segment 216 is read. The value of the segment is 1101001011 with code word = 3, and position is 10th. The string of bit/bit index 246 is set with code word 3 at position 10th>)
Claim 15
Claims 1 is included; Malik discloses wherein the data group unit is a bit-group value comprising a designated number of bits (<examiner note: For instance, first subsegment/data group unit of segment 212 has value 0111011101 has 10 bits>)
Claim 17
wherein each bit-group value comprises groups of consecutive bits, or wherein each bit-group value comprises groups of non-consecutive bits. (<examiner note: the first subsegment/data group unit of segment 212 comprises group of consecutive bits>)
Claim 19
Claims 15 is included; Malik discloses wherein the data group unit is set as a bit-group value of 8-bits (col 5, line 8-10, “... the data set is broken into segments via step 102. In a preferred embodiment, each segments has the same size...” <examiner note: segment can be any size>)
Claim 20
Claims 15 is included; Malik discloses wherein the size of the bit index for each data segment is a function of the number of bits that define each bit-group value (<examiner note: the size of string of bit 242 is based on the number of subsegment/data group unit in segment 212>)
Claim 21
Claim 15 is included; Malik discloses wherein the size of the bit index for each data segment corresponds to or is calculated as 2n bits, where n is the number of bits in the bit-group value (<examiner note: the string of bits 242, for instance, is 10 bit or 2n bit>) 
Claim 24
Claim 1 is included; Malik discloses wherein segmenting the data file comprises segmenting the data file into data segments that have a size that is a function of the number of bits or bytes defining the data group units (col 5, line 8-10, “... the data set is broken into segments via step 102. In a preferred embodiment, each segments has the same size...”)
Claim 25
Claim 1 is included; Malik discloses wherein segmenting the data file comprises segmenting the data file into data segments that each have a size that is based on the total number of unique values representable by the data group unit and the number of bits or bytes defining the data group units (col 5, line 8-10, “... the data set is broken into segments via step 102. In a preferred embodiment, each segments has the same size...”)
Claim 26
Claim 1 is included; Malik discloses wherein segmenting the data file comprises segmenting the data file into data segments that each have a size that is sufficient to contain all unique values representable by the data group unit (col 6, line 20-21, “... the source data file 220 has been separated into three segments 212, 214, and 216. Each segment 212, 214, and 216 includes ten subsegments 218...” <examiner note: file 220 is segmented into possible 10 subsegments>)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2-4, and 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Malik (U.S. Patent 6438556 B1), in view of claims 1, and further in view of Isaacson (U.S. Pub 2011/0016136 A1)
Claim 2
Claim 1 is included; however, Malik does not explicitly disclose further comprising processing the bit index for each data segment to generate a count value for each data segment representing the number of unique data group unit values in the data segment.
	Isaacson discloses processing the bit index for each data segment to generate a count value for each data segment representing the number of unique data group unit values in the data segment ([0102], line 1-2, “... The data stream 30 in FIG. 9 represents the original series of bits in the stored file...” [0108], line 1-2, “... After completion of the tuple array, we are ready to look for the tuples in the data stream 30...” [0110], “... we have gathered statistics for how many times each tuple appears in the data stream 30...” <examiner note: tuple array includes unique data group unit values (e.g., 0>0, 0>1, 1>0, and 1>1) in the data stream 30. Fig. 11, table 40 shows the count values for tuples  0>0, 0>1, 1>0, and 1>1 in the data stream. For instance, 00 has a count of 95>)
Claim 3
Claim 2 is included; Isaacson discloses further comprising selectively applying Huffman or similar indexing to one or more of the data segments depending on their respective count values to generate one or more Huffman or similar indexes for one or more of the data segments ([0116], “... we again count the number of instances of each of the symbols in the current alphabet (now having "0," "1" and "2.") The total symbol count in the data stream is 288 symbols as seen in table 41', FIG. 15. We also have one end-of-file (EOF) symbol at the end of the data stream (not shown)....”  [0117], “... Next, we use the counts to build a Huffman binary code tree. 1) List the symbols from highest count to lowest count. 2) Combine the counts for the two least frequently occurring symbols in the dictionary. This creates a node that has the value of the sum of the two counts. 3) Continue combining the two lowest counts in this manner until there is only one symbol remaining. This generates a Huffman binary code tree....” <examiner note: using the counts of each tuples as 0>0, 0>1, 1>0, and 1>1 to replace the highest occurrence tuple with a new symbol (e.g., 2), a Huffman binary code tree is generated>)
Claim 4
Claim 3 is included; Isaacson further discloses comprising selectively applying Huffman or similar indexing to one or more of the data segments if it is determined that their respective count values indicate that the application of Huffman or similar indexing will be effective in generating an output data file comprising the bit indexes in combination with the Huffman or similar indexes that are smaller or compressed relative to the original data file, and wherein determining the effectiveness of applying Huffman or similar indexing is based on whether the count values of the bit indexes representing the data segments fall within a predetermined range ([0123], “... Finally, we compare the original number of bits (384, FIG. 12) to the current number of bits (508) that are needed for this compression pass. We find that it takes 1.32 times as many bits to store the compressed data as it took to store the original data, table 58, FIG. 19. This is not compression at all, but expansion...” [0142] Finally, we compare the original number of bits (384) to the current number of bits (382) that are needed for this compression pass. We find that it takes 0.99 times as many bits to store the compressed data as it took to store the original data. Compression is achieved...” <examiner note: the effectiveness of applying Huffman or similar indexing based on counting of unique tuples in data stream is within a range of 0% -100%>)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include iteratively substitutes symbols for highly occurring tuples/unique data unit values in data stream and applying Huffman scheme on the data stream to generate Huffman codes as disclosed by Isaacson into Malik because bit index/string of bits 242, 244, and 246 are generated only on a first pass. Isaacson identifies and replace highly occurring data unit values with symbol(s) iteratively. After each pass, the compression efficiency is calculated so that the size of the output/compressed file is smaller than the original file.
Claim 6
Claim 2 is included; Isaacson discloses comprising applying Huffman or similar indexing to one or more of the data segments based on their respective count value to generate Huffman or similar indexes ([0116], “... we again count the number of instances of each of the symbols in the current alphabet (now having "0," "1" and "2.") The total symbol count in the data stream is 288 symbols as seen in table 41', FIG. 15. We also have one end-of-file (EOF) symbol at the end of the data stream (not shown)....”  [0117], “... Next, we use the counts to build a Huffman binary code tree. 1) List the symbols from highest count to lowest count. 2) Combine the counts for the two least frequently occurring symbols in the dictionary. This creates a node that has the value of the sum of the two counts. 3) Continue combining the two lowest counts in this manner until there is only one symbol remaining. This generates a Huffman binary code tree....” <examiner note: Huffman scheme in fig. 16>)
Claim 7
Claim 3 is included; Isaacson further discloses comprising generating an output data file comprising the bit indexes and the Huffman or similar indexes that collectively represent the original data file ([0196], line 1-2, “... To store the encoded data, we replace the symbol with its matching Huffman code and write the bits to the media...” The compressed bit string for the data, without spaces is: 010000111111111111111111111111111011001110110011111111011001011000110001 10001100011000101101010...” <examiner note: Huffman codes are in place of symbols in data stream>)




Response to Arguments
Section 35 U.S.C 112 – pg. 6
The rejections to claims 25 and 26 are withdrawn as necessitated by Amendment.
Section 102/103
	In pg. 7, Applicant wrote “... The compressed version of the datafile then encodes each of the segments 212, 214, and 216 in a string of bits, where each bit 
	Examiner disagrees with this interpretation. There is no line in Malik that describes the compressed version of datafile then encodes each of the segments 212, 214, and 216 in a string of bits.
	The data file 210 is split into 3 segments 212, 214, and 216. Each segment has 10 data groups 218 (e.g., 1011011101, 0110010011, and so on). Three segments has total 30 data groups 218. Dictionary 220  (i.e., dictionary coder, lossless data compression algorithm) is used to represent data groups 218 as 2 bits long code word (col 6, line 36-36). The data file 210 is compressed into a compressed file 240 using association between two bits long code words and data groups 218. In other word, each data group (i.e., 10 bits long) 218 is replaced by a two bit long code word. As a results, the compressed file 240 is much smaller than the original data file 220.

    PNG
    media_image2.png
    721
    621
    media_image2.png
    Greyscale



Applicant argues in pg. 9

    PNG
    media_image3.png
    247
    707
    media_image3.png
    Greyscale

Applicant’s argument has been considered; however, examiner respectfully disagrees.
 “...wherein the offset position of each bit in the bit index corresponds or is associated to the unique data group unit value that the bit represents...”
	Let take a closer look 
bit string 242, first position, a first 2 bits long code word 0. This code word 0 is associated with data group unit value 1011011101 of the first data group 218 in segment 212 as shown below

    PNG
    media_image1.png
    616
    613
    media_image1.png
    Greyscale

	







    PNG
    media_image4.png
    616
    613
    media_image4.png
    Greyscale

	Therefore, Malik clearly meets claim limitation.




    PNG
    media_image5.png
    279
    711
    media_image5.png
    Greyscale

It is unclear what is the nature of the configurable or preset “data group unit”. The specification and applicant’s argument does not provide any definition of “...the nature of the configurable or preset “data group unit”...”
Applicant argues in pg. 22-23

    PNG
    media_image6.png
    125
    700
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    64
    695
    media_image7.png
    Greyscale

Examiner respectfully disagrees because Malik clearly teaches  “...wherein the offset position of each bit in the bit index corresponds or is associated to the unique data group unit value that the bit represents...” (claim 22). See explanation above.
Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAU HAI HOANG whose telephone number is (571)270-5894.  The examiner can normally be reached on 1st biwk: Mon-Thurs 7:00 AM-5:00 PM; 2nd biwk: Mon-Thurs: 7:00 am-5:00pm, Fri: 7:00 am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


HAU HAI. HOANG
Examiner
Art Unit 2167



/HAU H HOANG/Examiner, Art Unit 2167