DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

 

Claim Objections
Claims 1-2, 5, 8, 13-14 and 16-17 are objected to due to the following exemplary informalities: Each of these claims contains extraneous graphic characters (i.e., “bullet points”).      
Claims 9-10 and 16 are objected to due to the following exemplary informalities: Each of these claims is not terminated with a period.    
Claims 13-14 and 16-17 are objected to due to the following exemplary informalities: Each of these claims contain extraneous references to drawings’ elements.    
Applicant is respectfully reminded to review the specification/abstract/ claims/drawings for all informalities.  Appropriate correction is required.


Claim Rejections – 35 U.S.C. § 101
35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 18-22 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to non-statutory subject matter.

Regarding claims 18-20:  Each of these claims is directed to a “computer readable medium”.  Such a medium was never defined in the Specification.  Therefore, the claim has been interpreted as an attempt to claim a signal.
During examination, the PTO is obliged to give claims their broadest reasonable interpretation consistent with the specification.  See In re Zletz, 893 F.2d 319 (Fed. Cir. 1989) (during patent examination the pending claims must be interpreted as broadly as their terms reasonably allow). When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter.  See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and “Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101,” Aug. 24, 2009, p. 2.
The broadest reasonable interpretation of a claim drawn to a computer usable medium typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is silent. See MPEP § 2111.01.  The same is true even when the computer medium is limited to a “storage” medium.  See Ex parte Mewherter, No. 2012-007692, p. 6-14 (PTAB May 8, 2013) (precedential) (providing a “growing body of evidence … demonstrating that the ordinary and  
The Office suggests reciting “non-transitory computer-readable media”.  See Subject Matter Eligibility of Computer Readable Media, 1351 OG 212 (February 23, 2010).



Additionally regarding claims 21-22:  It is not clear what statutory subject matter class “support data” falls within.  It is possibly software per se, which is not a product, process, manufacture nor a composition of matter.  






Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 14 and 21-22 are rejected under 35 U.S.C. § 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Regarding claim 14:  
Independent claim 14 (system/apparatus/encoder) purports to be an independent claim, yet it refers back to claim 12 (a method claim).  It also mixes statutory subject matter categories.
Therefore, the scope of the claim is ambiguous.

Regarding claims 21 and 22:  
It is not clear specifically what “support data” means.  How is “support” being supplied/implemented?  
Therefore, the scope of each claim is ambiguous.



Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.



Claims 1-4, 8-10, 18-19 and 21-22 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”).

Regarding claim 1, Bestgen discloses a method for encoding genome sequence data ([0003], Relational databases are being considered for storing sequence data that has the characteristics of long repeating sequences generally consisting of a relatively small number of distinct values. …. An example of such sequence data is genomic sequence data; [0011], Embodiments of the invention provide a method, apparatus and program product that utilize an Encoded Matrix Index data structure that may be used by database queries to address deficiencies in searching sequence type data stored in relational databases. An Encoded Matrix Index is based in part on the premise of storing reference values for portions of sequence data; [0031], FIG. 13 shows an exemplary table containing genome sequence data and an associated Encoded Matrix Index control data structure), said genomic encoder (1115) comprising:  reads of sequences and nucleotides ([0004], Genomic sequence data is typically represented as long sequences of letters or numbers each having a small number of distinct values. … Base pairs are two nucleotides on opposite complementary DNA or RNA strands connected via a hydrogen bond. Each nucleotide is typically represented by the letters A, C, G, and T), said method comprising the steps of:  partitioning said reads into clusters of reads which share a common sequence or subsequence of nucleotides called “cluster signature” ([0011], An Encoded Matrix ), encoding said clustered reads as a multiplicity of blocks of syntax elements ([0037], As noted above, embodiments of the invention may be used in connection with storing and accessing sequence type data in a database table. To address concerns such as using a relational table to store sequence data in the form of large strings where a significant portion of the strings contains overlapping sequence data, or storing sequence data across multiple columns or multiple rows (i.e., multiple blocks), each having a portion of the sequence data)., and structuring said blocks of syntax elements with header information thereby creating successive Access Units ([0011], storing additional information regarding the variation of specific sequences relative to those reference values. By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in the index may be greatly reduced as compared to storing the sequences themselves. Additionally, the sequences can be easily reconstructed based upon the reference values and the variations therefrom when the sequence data or portions thereof are retrieved from the index; [0014], Embodiments of the invention may execute a database query using the Encoded Matrix Index. In response to a database query that includes a term associated with a sub-column defined in a column of a database table, a variation data structure is accessed from the Encoded Matrix Index to determine whether any variation exists between rows belonging to the sub-column of the database table. If no variation exists, a value is accessed from the reference data structure from the Encoded Matrix Index, which is associated with the sub-column, and a determination ).


Regarding claim 2, Bestgen discloses the method of claim 1, wherein said clusters signatures are encoded by associating each nucleotide of the supported alphabet to a unique binary representation, and concatenating said binary representations of each nucleotide in a signature to obtain a bitstring representing the encoded structure ([0012], The element may include an array having a plurality of values for each row in the database table, or the element may include a single value for each row in the database table. In some embodiments, the variation data structure may be a binary structure having a bit for each element of the column, where each bit within the structure indicates a variation for a corresponding element; [0036], A database engine may use the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps can be AND'ed and OR'ed together to satisfy an ad hoc query (a bitstring).

Regarding claim 3, Bestgen discloses the method of claim 2, wherein each of cluster of encoded sequence reads is identified by said encoded structure ([0012], the element may include a single value for each row in the database table; [0014], the value data structure from the Encoded Matrix Index is accessed and a value is identified for each row of the sub-column to determine which elements of the sub-column, if any, match the term in the database query. In some embodiments the value data structure includes a bitmap mapping of the value for each row of the sub-column).


Regarding claim 4, Bestgen discloses the method of claim 3, wherein said blocks of syntax elements comprise a master index table, comprising cluster signatures encoded as per claim 2 and associated to a vector of integer values representing positions on a storage medium of the blocks of encoded syntax elements representing the sequence reads belonging to each cluster ([0014], Embodiments of the invention may execute a database query using the Encoded Matrix Index. In response to a database query that includes a term associated with a sub-column defined in a column of a database table, a variation data structure is accessed from the Encoded Matrix Index to determine whether any variation exists between rows belonging to the sub-column of the database table; [0036], an encoded vector index ("EVI"). An EVI is a data structure that is made up of two primary components: a symbol table and a vector. The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information typically includes a numeric `gray` code identifying the key, the first and last rows where the key is found, and the number of times the key appears in the table. The vector corresponds to the actual rows in the table and contains a list of byte codes indicating which key each row contains).


Regarding claim 8, Bestgen discloses  encoding of raw, unmapped and unaligned sequence reads encoded as per claim 1, and further discloses a method for decoding genomic data ([0011], By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in the index may be greatly reduced as compared to storing the sequences themselves. Additionally, the 
SELECT Patient_ID WHERE Sequence..Nucleotide[2]="A" AND Sequence..Nucleotide[4]="T" FROM Schema/SeqData 
The sequence column in table 300 has a sub-column named Nucleotide defined in the EMI structure in FIG. 13 and FIG. 14. The query values are mapped to bitmap representations using the map data structure 322 associated with the Nucleotide sub-column … The mapped predicate value "11" is now compared to the bitmap values for each of the rows in the value vector 334, with a match being found at row 2. Row 2 is returned and when ANDed with the results from the first predicate, selects the Patient_ID "9002" from row 2 of table 300), decoding said multiplicity of blocks of syntax elements to extract raw, unmapped and unaligned reads ([0060], In the above query, no values from the "sequence" column were returned in the query results. If the query had asked that the values be returned, the EMI engine would use the result of the bitmap AND or OR processing ([0037], As noted above, embodiments of the invention may be used in connection with storing and accessing sequence type data in a database table. To address concerns such as using a relational table to store sequence data in the form of large strings where a significant portion of the strings contains overlapping sequence data, or storing sequence data across multiple columns or multiple rows, each having a portion of the sequence data … the illustrated embodiments utilize a different type of index, referred to herein as an Encoded Matrix Index (EMI), to better accommodate sequence data having many unique values for a particular column, but having relatively few unique values within a sub-column; [0038], An Encoded Matrix Index (EMI) is a data structure that includes at least a reference data structure, a variation data structure and value data structure. The reference data structure generally contains a reference value used to compare against contents of portions of a column in a database table), decoding said clusters signatures by associating to each binary representation of the signature the corresponding sequence of nucleotides ([0060], The element ([0058], Once the position has been determined, the variation vector data structure is referenced check for a variation … the value vector is retrieved that corresponds to the sub-unit (block 172). A dynamic bitmap may then be created by comparing the query value against the retrieved value vector elements (block 174); [0068], The sequence column in table 300 has a sub-column named Nucleotide defined in the EMI structure in FIG. 13 and FIG. 14. The query values are mapped to bitmap representations using the map data structure 322 associated with the Nucleotide sub-column … The mapped predicate value "11" is now compared to the bitmap values for each of the rows in the value vector 334, with a match being found at row 2. Row 2 is returned and when ANDed with the results from the first predicate, selects the Patient_ID "9002" from row 2 of table 300), and extracting multiple blocks of syntax elements from the Access Units by employing header information ([0035], multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. B+ trees, R-trees, and bitmaps. A B+ tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key; [0036], The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information 0039], The data structure for the sub-columns may also store information for each sub-column … the position of the first sub-unit within the sub-units data structure, and a pointer to the maps data structure for the sub column).


Regarding claim 9, Bestgen discloses the method of claim 8, further comprising decoding a genomic dataset header containing global configuration parameters  ([0058], Once the position has been determined, the variation vector data structure is referenced check for a variation. If the corresponding bit in the variation vector is off ("Yes" branch of decision block 164), then the offset and the length of the sub-column are stored (block 166) and used to retrieve the value from the reference data structure (block 168) … Because there is no variation for this sub-column among all of the rows, a global result can be created for this sub-column (block 170)).


Regarding claim 10, Bestgen discloses the method of claim 9, further comprising decoding a master index table containing coded clusters signatures and coded blocks offsets  ([Abstract], A method, apparatus, and program product are provided for creating an Encoded Matrix Index for a column in a database table. An element of the column for all rows in the database table is compared to a corresponding reference value in a reference data structure, and in response to at least one value for the element of the column not matching the reference 0011], the sequences can be easily reconstructed based upon the reference values and the variations therefrom when the sequence data or portions thereof are retrieved from the index; [0036], The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information typically includes a numeric `gray` code identifying the key, the first and last rows where the key is found, and the number of times the key appears in the table. The vector corresponds to the actual rows in the table and contains a list of byte codes indicating which key each row contains; [0058], Once the position has been determined, the variation vector data structure is referenced check for a variation. If the corresponding bit in the variation vector is off ("Yes" branch of decision block 164), then the offset and the length of the sub-column are stored (block 166) and used to retrieve the value from the reference data structure (block 168). For example, if the offset for the sub-column is two and the length of the sub-column is four, then the value is retrieved from the second through fifth positions in the reference data structure).


Claim 18 is substantially similar to claim 2, and therefore likewise rejected.

Claim 19 is substantially similar to claim 2, and therefore likewise rejected.

Claim 21 is substantially similar to claim 1, and therefore likewise rejected.

Claim 22 is substantially similar to claim 2, and therefore likewise rejected.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.



Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Ganeshalingam et al (US Patent Application Publication No. 2012/0230338, hereafter referred to as “Ganeshalingam”) and Kumar et al (US Patent Application Publication No. 2015/0234870, hereafter referred to as “Kumar”).

Regarding claim 5, Bestgen discloses the method of claim 4 wherein said blocks of syntax elements comprise additional information (such as in a header) related to the sequence ([0011], storing additional information regarding the variation of specific sequences relative to those reference values), comprising a dataset group identifier used to uniquely identify each dataset group, a genomic dataset identifier used to uniquely identify each dataset ([0068] consider a query such as the following:  SELECT Patient_ID WHERE Sequence  Nucleotide [2]-”A” AND Sequence .. Nucleotide [4]-”T” FROM Schema!SeqData   The sequence column in table 300 has a sub-column named Nucleotide defined in the EMI structure in Fig. 13 and Fig. 14.  The query values are mapped to bitmap representations using the map data structure 322 associated with the Nucleotide sub-column … The mapped predicate value “11” is now compared to the bitmap values for each of the rows in the value vector 334, with a match being found at row 2.  Row 2 is returned and when ANDed with the results from the first predicate, selects the Patient_ID”9002” from row 2 of table 300), a flag signaling the presence of block headers, a flag signaling the order in which Access Units are stored on a storage medium in order to facilitate data access when decoding said Access Units ([0039], the data structure for the sub-columns may also store information for each sub-column … the position of the first sub-unit within the sub-units data structure, and a pointer to the maps data structure for the sub-column), the number of bits used to represent the 
Bestgen fails to explicitly disclose a header comprising a brand identifier used to identify the data format specification the dataset complies with, a minor version number used to identify the data format specification the dataset complies with, a minor version number used to identify the data format specification the dataset complies with, a flag signaling the presence of paired end reads, the number of reference sequences used to code the dataset, a numeric identifier per each reference sequence used to uniquely identify each reference sequence, a string identifier per reference sequence used to uniquely identify each reference sequence, the number of coded Access Units per reference sequence used to count the Access Units associated to each reference sequence, the type of coded genomic data used to distinguish among aligned reads, unaligned reads, unmapped reads and reference sequences, the number of data classes coded in the dataset, the number of descriptors used per each data class coded in the dataset used during the decoding process, the total number of clusters used to index encoded unmapped reads.  

Ganeshalingam teaches genomic sequence databases ([0007]  In one aspect the disclosure relates to a method of conveying biological sequence data.  The method includes generating a data packet including a first header containing network routing information, a second header containing header information pertaining to the biological 

Kumar teaches relational genomic databases ([0003], instead of storing all genetic data, which can result in significant duplications and/or overlap, only variations are stored along with a corresponding reference genome; [0004], the text file is arranged in an extensible format and includes a plurality of metadata lines, a header line, and a plurality of content lines.  At least one key is retrieved from the content lines.  For each key, a data type and a number from at least one metadata table is retrieved.  Using a combination of each key and the corresponding data type and number, at least one column title is derived; [0005], a lookup table can be generated for storing a mapping of fields from the text file to the content tables.   The at least one column title of the at least one column can be associated with the corresponding key contained in the content tables), wherein headers are used relate sequence data to various other data types relevant to the sequence data ([0042], VCF can provide variant location by source, chromosome, and position, and the details can differ for both position and source.  The file can contain metadata lines 410, a header line 420, and content lines 430.  The metadata lines 410 can begin with “##CHROM” and can provide the column structure for the data in the content lines 430.  The VCF file may not be provided in a relational database friendly format. In order to convert the information into a relational database format, detailed mapping strategies are required. The VCF file can include diverse data types, fields presented in a non-standard form, as well as specific symbols (for example "##", and "|"), which have specific non-standard database nomenclature--all of which may need to be mapped in order to convert the information from the VCF file into a relational database format.  It would have The at least one key from a column associated with a format parameter in the header line can be retrieved. Corresponding to each key, at least one corresponding value can be retrieved from at least one column associated with data parameters from the header line. The at least one corresponding value can be located in a similar parallel physical location as the corresponding key in the column associated with a format parameter from the header line. Each value retrieved from at each content line can be mapped to a new row in the at least one column associated the key in the content table having the dynamic structure.

Bestgen discloses the claimed invention except for the specific arrangement and/or content of information set forth in the claims.  It would have been obvious to one skilled in the art at the time of the invention to provide any type of displayed/indicated information in the header since such an information header depends only on the intended use of the apparatus/system and the desired information to be displayed.  The motivation for doing so would be to store compressed sequencing data that can be readily decompressed or decoded as taught by Bestgen ([0011], By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in the index may be greatly reduced as compared to storing the sequences themselves. Additionally, the sequences can be easily reconstructed based upon the reference values and the variations therefrom when the sequence data or portions thereof are retrieved from the index.)






Claim 6 is rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Ganeshalingam et al (US Patent Application Publication No. 2012/0230338, hereafter referred to as “Ganeshalingam”), Kumar et al (US Patent Application Publication No. 2015/0234870, hereafter referred to as “Kumar” and Gnirke et al (US Patent Application Publication No. 2014/0228223, hereafter referred to as “Gnirke”).

Regarding claim 6, modified Bestgen discloses the method of claim 5.  Bestgen fails to explicitly disclose wherein said genomic reads are paired.
Gnirke teaches a database of genomic sequencing reads wherein said genomic reads are paired ([0006], The present invention is related to genomic nucleotide sequencing. In particular, the invention describes a paired end sequencing method that improves the yield of unique read pairs that are far (i.e., for example, 1-1000 kb) apart in the genome; [0296], Resembl.RTM. is an extended version of Ensembl.RTM. and allows storage, query and viewing of Illumina resequencing data in a genomic context. The data and re-sequencing datasets from ELAND.RTM.-based alignment data can be loaded into Resembl.RTM. databases and Resembl.RTM. websites can be used for interactive data mining and QC content. The Resembl.RTM. back-end database may be designed to allow storage 
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teaching of Gnirke for the purpose of providing efficient sequence alignment and mapping of complex genomes as taught by Gnirke ([0004], However, further improvements are necessary to improve the efficiency of these massively parallel sequencing systems to enable routine sequencing and assembly of complex genomes de novo (i.e., without a pre-existing reference sequence). Essentially all methods for assembling genomes de novo require pairs of sequencing reads that have an a priori defined orientation and spacing in the underlying genome. Long-distance (i.e., for example 30-45 kb) read pairs are particularly important to provide long-range contiguity of genome assemblies. Without such long-distance read pairs, genome assemblies remain highly fragmented. Approaches that improve the yield of long-distance read pairs by massively-parallel sequencing and thus the quality of genome assemblies would greatly facilitate biological and medical research.).




Claims 7 and 20 are rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Ganeshalingam et al (US Patent Application Publication No. 2012/0230338, hereafter referred to as “Ganeshalingam”), Kumar et al (US Patent Application Publication No. 2015/0234870, hereafter referred to as “Kumar”), Gnirke et al (US Patent Application Publication No. 2014/0228223, hereafter referred to as “Gnirke”) and Kärkkäinen et al [Gurulogic Microsystems Oy] (Foreign Patent Application Publication No. WO 2015/197201 A1, hereafter referred to as “Gurulogic”).

Regarding claim 7, modified Bestgen discloses the method of claim 6.  Bestgen fails to explicitly disclose wherein said genomic data are entropy coded.  
Gurulogic teaches wherein genomic data are entropy coded (page 3 lines 1-7,  in a first aspect, embodiments of the present disclosure provide an encoder including processing hardware for encoding  input data (01) to generate corresponding encoded data (E2).  The processing hardware is operable to process the input data (01) as data blocks and/or data packets.  Optionally, the input data (01) is in a form of at least one of:  text data … genomic data, multidimensional data and/or one-dimensional data, but not limited thereto; page 26 line 35 – page 27 line 2, when the encoded data (E2) is entropy-coded with an advanced range coding method that is based on arithmetic compression, compressed data (C4) so generated requires only 113 bytes (= 904 bits) for communicating over the communication network.  Correspondingly, when the input data (01) is entropy-coded in a similar manner, compressed input data so generated requires 253 bytes (= 2024 bits).  Thus, an amount of input data (01) without any compression is 270 bytes).  
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.

Claim 20 is substantially similar to claim 7, and therefore likewise rejected.  




Claim 11 is rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Gnirke et al (US Patent Application Publication No. 2014/0228223, hereafter referred to as “Gnirke”).

Regarding claim 11, modified Bestgen discloses the method of claim 10.  Bestgen fails to explicitly disclose wherein said genomic reads are paired.
Gnirke teaches a database of genomic sequencing reads wherein said genomic reads are paired ([0006], The present invention is related to genomic nucleotide sequencing. In particular, the invention describes a paired end sequencing method that improves the yield of unique read pairs that are far (i.e., for example, 1-1000 kb) apart in the genome; [0296], Resembl.RTM. is an extended version of Ensembl.RTM. and allows storage, query and viewing of Illumina resequencing data in a genomic context. The data and re-sequencing datasets from ELAND.RTM.-based alignment data can be loaded into Resembl.RTM. databases and Resembl.RTM. websites can be used for interactive data mining and QC content. The Resembl.RTM. back-end database may be designed to allow storage and retrieval of the large amounts of re-sequencing data in an efficient way. It supports paired-end alignments at high coverage, as well as per-base and summary-type data on coverage 
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teaching of Gnirke for the purpose of providing efficient sequence alignment and mapping of complex genomes as taught by Gnirke ([0004], However, further improvements are necessary to improve the efficiency of these massively parallel sequencing systems to enable routine sequencing and assembly of complex genomes de novo (i.e., without a pre-existing reference sequence). Essentially all methods for assembling genomes de novo require pairs of sequencing reads that have an a priori defined orientation and spacing in the underlying genome. Long-distance (i.e., for example 30-45 kb) read pairs are particularly important to provide long-range contiguity of genome assemblies. Without such long-distance read pairs, genome assemblies remain highly fragmented. Approaches that improve the yield of long-distance read pairs by massively-parallel sequencing and thus the quality of genome assemblies would greatly facilitate biological and medical research.).




Claim 12 is rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Gnirke et al (US Patent Application Publication No. 2014/0228223, .

Regarding claim 12, modified Bestgen discloses the method of claim 11.  Bestgen fails to explicitly disclose wherein said genomic data are entropy decoded.  
Gurulogic teaches wherein genomic data are entropy coded and decoded (page 1 lines 1-10, the present disclosure relates generally to data compression, and more specifically, to encoders for encoding input data (D1) to generate corresponding encoded data (E2), and decoders for decoding the encoded data (E2) to generate corresponding decoded data (D3).  Moreover, the present disclosure relates to the methods of encoding input data (D1) to generate corresponding encoded data (E2) to generate corresponding decoded data (D3);  (page 3 lines 1-7,  in a first aspect, embodiments of the present disclosure provide an encoder including processing hardware for encoding  input data (01) to generate corresponding encoded data (E2).  The processing hardware is operable to process the input data (01) as data blocks and/or data packets.  Optionally, the input data (01) is in a form of at least one of:  text data … genomic data, multidimensional data and/or one-dimensional data, but not limited thereto; page 26 line 35 – page 27 line 2, when the encoded data (E2) is entropy-coded with an advanced range coding method that is based on arithmetic compression, compressed data (C4) so generated requires only 113 bytes (= 904 bits) for communicating over the communication network.  Correspondingly, when the input data (01) is entropy-coded in a similar manner, compressed input data so generated requires 253 bytes (= 2024 bits).  Thus, an amount of input data (01) without any compression is 270 bytes).  
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.






Claims 13, 16 and 17 are rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Kärkkäinen et al [Gurulogic Microsystems Oy] (Foreign Patent Application Publication No. WO 2015/197201 A1, hereafter referred to as “Gurulogic”) and Lee et al. (US Patent Application Publication No. 2012/0170595, hereafter referred to as “Lee”).

Regarding claim 13, Bestgen discloses a genomic encoder (1115) for the compression of raw, unmapped and unaligned genome sequence data (111), said genome sequence data (111) comprising reads of sequences of nucleotides ([0003], Relational databases are being considered for storing sequence data that has the characteristics of long repeating sequences generally consisting of a relatively small number of distinct values. …. An example of such sequence data is genomic sequence data; [0004], Genomic sequence data is typically represented as long sequences of letters or numbers each having a small number of distinct values. Storage and query of genomic sequence data is problematic in a relational database because of its size. The entire human genome consists of approximately 3 billion base pairs. Base pairs are two nucleotides on opposite complementary DNA or RNA strands connected via a hydrogen bond. Each nucleotide is typically represented by the letters A, C, G, and T; [0011], Embodiments of the invention provide a method, apparatus ; [0015], if any new variation exists as a result of the update between the sub-column and the reference data structure the variation data structure is updated to indicate the variation; [0031], FIG. 13 shows an exemplary table containing genome sequence data and an associated Encoded Matrix Index control data structure; [0061], When columns are inserted or updated within a table, the EMI engine is used to update any EMI associated with that table illustrated in the flowchart 250 in FIG. 12. If, for example, a column is inserted into the table, the EMI engine creates a variation vector, which includes one bit for every sub-column/sub-unit in the new column along with the offsets and lengths associated with the sub-units in the sub-unit data structure), said genomic encoder (1115) comprising:  a clustering unit (112), configured to partition said reads in a group of reads that share a common sequence or subsequence of nucleotides called clusters signatures thereby creating clusters of reads (113) and cluster signatures (114) ([0011], An Encoded Matrix Index is based in part on the premise of storing reference values for portions of sequence data that often represent the most common values found in a particular type of sequence data in a column of a database table, and storing additional information regarding the variation of specific sequences relative to those reference values. By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in the index may be greatly reduced as compared to storing the ), one or more descriptor encoding units (115) configured to encode said clustered reads as blocks of syntax elements ([0012], The element may include an array having a plurality of values for each row in the database table, or the element may include a single value for each row in the database table. In some embodiments, the variation data structure may be a binary structure having a bit for each element of the column, where each bit within the structure indicates a variation for a corresponding element; [0036], A database engine may use the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps can be AND'ed and OR'ed together to satisfy an ad hoc query (a bitstring)),  a signatures encoding unit (116) configured to binarize the clusters signatures (114) by associating a unique binary representation to each symbol of the clusters signatures ([0012], The element may include an array having a plurality of values for each row in the database table, or the element may include a single value for each row in the database table. In some embodiments, the variation data structure may be a binary structure having a bit for each element of the column, where each bit within the structure indicates a variation for a corresponding element; [0036], A database engine may use the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps ), a Genomic Dataset Header and Master Index Table generator (119) configured to associate said binarized cluster signatures (117) to a vector of integers expressing the offset on a storage medium of the entropy coded descriptors contained in the Genomic Access Units ([0058], Once the position has been determined, the variation vector data structure is referenced check for a variation. If the corresponding bit in the variation vector is off ("Yes" branch of decision block 164), then the offset and the length of the sub-column are stored (block 166) and used to retrieve the value from the reference data structure (block 168). For example, if the offset for the sub-column is two and the length of the sub-column is four, then the value is retrieved from the second through fifth positions in the reference data structure).

Bestgen fails to explicitly disclose one or more entropy encoding units (1110), configured to compress said blocks of syntax elements according to their statistical properties to produce Genomic Access Units (1111) or a multiplexer (1113) for multiplexing the compressed genomic data and metadata.  

Gurulogic teaches wherein genomic data are entropy encoded and decoded to compress said blocks of syntax elements according to their statistical properties to produce Genomic Access Units (page 1 lines 1-10, The present disclosure relates generally to data compression, and more specifically, to encoders for encoding input data (D1) to generate corresponding encoded data (E2), and decoders for decoding the encoded data (E2) to generate corresponding decoded data (d3).  Moreover, the present disclosure relates to methods of encoding input data (D1) to generate corresponding encoded data (E2), and methods of decoding the encoded data (E2) to generated corresponding decoded data (D3); page 3 lines 1-7, In a first respect, embodiments of the present disclosure provide an encoder including processing hardware for encoding input data (01) to generate corresponding encoded data (E2).  
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.


Lee teaches a multiplexer (1113) for multiplexing and demultiplexing the compressed data and metadata ([0010], The present invention provides a method and apparatus in which, based on the length of each frame of a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and then multiplexed; [0011], The present invention also provides a method and apparatus, in which a stream generated by multiplexing bitstreams coded to have different frame lengths by a plurality of coders in a coding end is demultiplexed in order to detect the length of each frame of a reference bitstream and the remaining bitstreams are extracted ; [0086], based on the length of each frame of a bitstream selected as a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and multiplexed; [0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end. Moreover, the decoding end can demultiplex and decode the remaining bitstreams without information about the data size of each bitstream obtained by dividing the remaining bitstreams. Therefore, the coding end and the decoding end can accurately and efficiently control a bitrate).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Lee for the purpose of reducing the size of data to be transmitted as taught by Lee ([0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end).

Regarding claim 16, Bestgen discloses a genomic decoder (1313) for the decompression of compressed genomic Access Units (134) ([0011], By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in ) said genomic decoder (1313) comprising: a Genomic Dataset Header and Master Index Table (133), and a parsing means (135) configured to parse said Genomic Dataset Header and Master Index Table (133) into encoded clusters signatures (137) ([0014], Embodiments of the invention may execute a database query using the Encoded Matrix Index. In response to a database query that includes a term associated with a sub-column defined in a column of a database table, a variation data structure is accessed from the Encoded Matrix Index to determine whether any variation exists between rows belonging to the sub-column of the database table; [0036], an encoded vector index ("EVI"). An EVI is a data structure that is made up of two primary components: a symbol table and a vector. The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information typically includes a numeric `gray` code identifying the key, the first and last rows where the key is found, and the number of times the key appears in the table. The vector corresponds to the actual rows in the table and contains a list of byte codes indicating which key each row contains), a signatures decoder (139) configured to decode said encoded clusters signatures (137) into clusters signatures (1311) ([0036], The symbol table contains the distinct key values in the rows covered, as well as statistical information about each key. The statistical information typically includes a numeric `gray` code identifying the key, the first and last rows where the key is found, and the number of times the key appears in the table. The vector corresponds to the actual rows in the table and contains [0068], consider a query such as the following: 
SELECT Patient_ID WHERE Sequence..Nucleotide[2]-"A" AND Sequence..Nucleotide[4]-"T" FROM Schema/SeqData 
The sequence column in table 300 has a sub-column named Nucleotide defined in the EMI structure in FIG. 13 and FIG. 14. The query values are mapped to bitmap representations using the map data structure 322 associated with the Nucleotide sub-column. For example, using the map data structure 322, the first predicate "A" is mapped to "00" and the second predicate "T" is mapped to "11". The element length in the sub-column data structure 320 is checked and is a "1" for both predicates, indicating that the sub-column Nucleotide is an array of values. For the first predicate, a pointer is set to an offset of 2, indicating the third sub-unit in the Nucleotide sub-column … The mapped predicate value "11" is now compared to the bitmap values for each of the rows in the value vector 334, with a match being found at row 2. Row 2 is returned and when ANDed with the results from the first predicate, selects the Patient_ID "9002" from row 2 of table 300), and one or more descriptors decoders (1310), configured to decode the genomic descriptors into uncompressed reads of sequences of nucleotides (1312) ([0037], As noted above, embodiments of the invention may be used in connection with storing and accessing sequence type data in a database table. To address concerns such as using a relational table to store sequence data in the form of large strings where a significant portion of the strings contains overlapping sequence data, or storing sequence data across multiple columns or multiple rows, each having a portion of the sequence data … the illustrated embodiments utilize a different type of index, referred to herein as an Encoded Matrix Index (EMI), to better ; [0038], An Encoded Matrix Index (EMI) is a data structure that includes at least a reference data structure, a variation data structure and value data structure. The reference data structure generally contains a reference value used to compare against contents of portions of a column in a database table; [0060], In the above query, no values from the "sequence" column were returned in the query results. If the query had asked that the values be returned, the EMI engine would use the result of the bitmap AND or OR processing to determine which rows satisfy the query, and then probe into the rows as seen in flowchart 200 in FIG. 11. If the sub-column associated with the query term is an array ("Yes" branch of decision block 202), the pointer into the sub-unit data structure is set similar to above by setting the pointer to the first sub-unit plus the element number within the sub-column array (block 204)… The element retrieved from the value vector is converted from a bitmap representation back to a value by referencing the appropriate map data structure associated with the sub-column (block 218). This process may be repeated for each of the rows matching the query criteria).

Bestgen fails to explicitly disclose a demultiplexer (132) for demultiplexing compressed genomic Access Units (134), or entropy decoders (136) configured to decompress said compressed genomic Access Units into blocks of syntax elements.  

Gurulogic teaches entropy decoders (136) configured to decompress said compressed Access Units into blocks of syntax elements (page 1 lines 1-10, The present disclosure relates generally to data compression, and more specifically, to 
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.


Lee teaches a demultiplexer (132) for demultiplexing compressed data ([0010], The present invention provides a method and apparatus in which, based on the length of each frame of a reference bitstream from among bitstreams coded to have different frame lengths by a ; [0011], The present invention also provides a method and apparatus, in which a stream generated by multiplexing bitstreams coded to have different frame lengths by a plurality of coders in a coding end is demultiplexed in order to detect the length of each frame of a reference bitstream and the remaining bitstreams are extracted using the detected length of each frame of the reference bitstream; [0086], based on the length of each frame of a bitstream selected as a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and multiplexed; [0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end. Moreover, the decoding end can demultiplex and decode the remaining bitstreams without information about the data size of each bitstream obtained by dividing the remaining bitstreams. Therefore, the coding end and the decoding end can accurately and efficiently control a bitrate).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Lee for the purpose of reducing the size of data to be transmitted as taught by Lee ([0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding 



Regarding claim 17, modified Bestgen discloses an unmapped sequence reads decoder (1313) configured as per claim 16, comprising a genomic decoder (148) for the decompression of a compressed genomic stream (1410) and bitstreams of unmapped sequence reads (145) ([0011], By storing only portions of sequences in the column that vary from the reference values, the amount of information necessary to represent the sequences in the index may be greatly reduced as compared to storing the sequences themselves. Additionally, the sequences can be easily reconstructed based upon the reference values and the variations therefrom when the sequence data or portions thereof are retrieved from the index; [0012], The element may include an array having a plurality of values for each row in the database table, or the element may include a single value for each row in the database table. In some embodiments, the variation data structure may be a binary structure having a bit for each element of the column, where each bit within the structure indicates a variation for a corresponding element; [0036], A database engine may use the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps can be AND'ed and OR'ed together to satisfy an ad hoc query), said genomic decoder (148) comprising:  one or more genomic descriptors decoders (146 – 147), configured to decode the genomic descriptors into classified reads of sequences of nucleotides (1411) such as syntax elements named genomic descriptors (145) ([0012], In one embodiment of the invention, an Encoded Matrix ), genomic data classes decoders (149) configured to selectively decode said classified reads f sequences of nucleotides on one or more reference sequences so as to produce uncompressed reads of sequences of nucleotides, and to produce uncompressed raw, unmapped and unaligned sequence reads (1414) and clusters signatures (1415) ([0037], As noted above, embodiments of the invention may be used in connection with storing and accessing sequence type data in a database table. To address concerns such as using a relational table to store sequence data in the form of large strings where a significant portion of the strings contains overlapping sequence data, or storing sequence data across multiple columns or multiple rows, each having a portion of the sequence data … the illustrated embodiments utilize a different type of index, referred to herein as an Encoded Matrix Index (EMI), to better accommodate sequence data having many unique values for a particular column, but having relatively few unique values within a sub-column; [0038], An Encoded Matrix Index (EMI) is a data structure that includes at least a reference data structure, a variation data structure and value data structure. The reference data structure generally contains a reference value used to compare against contents of portions of ).  

Bestgen fails to explicitly disclose a demultiplexer (140) for demultiplexing compressed genomic data and metadata into genomic bitstreams (141), or entropy decoders (142-144) configured to parse said compressed genomic stream into blocks of syntax elements.  

Gurulogic teaches entropy encoders (142-144) configured to parse said compressed genomic stream into blocks of syntax elements (page 1 lines 1-10, The present disclosure relates generally to data compression, and more specifically, to encoders for encoding input data (D1) to generate corresponding encoded data (E2), and decoders for decoding the encoded data (E2) to generate corresponding decoded data (D3).  Moreover, the present disclosure relates to methods of encoding input data (D1) to generate corresponding encoded data (E2), and methods of decoding the encoded data (E2) to generated corresponding decoded data (D3); page 3 lines 1-7, In a first respect, embodiments of the present disclosure provide an encoder including processing hardware for encoding input data (01) to generate corresponding encoded 
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.


Lee teaches a demultiplexer (140) for demultiplexing compressed data and metadata into bitstreams ([0010] The present invention provides a method and apparatus in which, based on the length of each frame of a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and then multiplexed; [0011], The present invention also provides a method and apparatus, in which a stream generated by multiplexing bitstreams coded to have different frame lengths by a plurality of coders in a coding end is demultiplexed in order to detect the length of each frame of a reference bitstream and the remaining bitstreams are extracted ; [0086], based on the length of each frame of a bitstream selected as a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and multiplexed; [0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end. Moreover, the decoding end can demultiplex and decode the remaining bitstreams without information about the data size of each bitstream obtained by dividing the remaining bitstreams. Therefore, the coding end and the decoding end can accurately and efficiently control a bitrate).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Lee for the purpose of reducing the size of data to be transmitted as taught by Lee ([0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end).





Claims 14-15 are rejected under 35 U.S.C. §103 as being unpatentable over Bestgen et al. (US Patent Application Publication No. 2012/0179712, hereafter referred to as “Bestgen”) in view of Gnirke et al (US Patent Application Publication No. 2014/0228223, hereafter referred to as “Gnirke”), Kärkkäinen et al [Gurulogic Microsystems Oy] (Foreign Patent Application Publication No. WO 2015/197201 A1, hereafter referred to as “Gurulogic”) and Lee et al. (US Patent Application Publication No. 2012/0170595, hereafter referred to as “Lee”).

Regarding claim 14, modified Bestgen discloses an unmapped sequence reads encoding unit (1115) configured as per claim 12 (see the previous rejection of claim 12, which also requires the Gnirke reference), comprising a genomic encoder (1210) for the compression of genome sequence data (121), said genome sequence data (121) comprising reads of sequences of nucleotides ([0003], Relational databases are being considered for storing sequence data that has the characteristics of long repeating sequences generally consisting of a relatively small number of distinct values. …. An example of such sequence data is genomic sequence data; [0004], Genomic sequence data is typically represented as long sequences of letters or numbers each having a small number of distinct values. …. Base pairs are two nucleotides on opposite complementary DNA or RNA strands connected via a hydrogen bond. Each nucleotide is typically represented by the letters A, C, G, and T; [0011], Embodiments of the invention provide a method, apparatus and program product that utilize an Encoded Matrix Index data structure that may be used by database queries to address deficiencies in searching sequence type data stored in relational databases. An Encoded Matrix Index is based in part ; [0015], if any new variation exists as a result of the update between the sub-column and the reference data structure the variation data structure is updated to indicate the variation; [0031], FIG. 13 shows an exemplary table containing genome sequence data and an associated Encoded Matrix Index control data structure;), said genomic encoder (1210) comprising:  an aligner unit (122), configured to align said reads to one or more reference sequences thereby creating aligned reads ([0011], An Encoded Matrix Index is based in part on the premise of storing reference values for portions of sequence data that often represent the most common values found in a particular type of sequence data; [0015], if any new variation exists as a result of the update between the sub-column and the reference data structure the variation data structure is updated to indicate the variation), a data classification unit (124) configured to classify said aligned reads according to specified matching rules with the one or more pre-existing reference sequences or constructed reference sequences thereby creating classes of aligned reads (128) ([0015], if any new variation exists as a result of the update between the sub-column and the reference data structure the variation data structure is updated to indicate the variation; [0031], FIG. 13 shows an exemplary table containing genome sequence data and an associated Encoded Matrix Index control data structure; [0061], When columns are inserted or updated within a table, the EMI engine is used to update any EMI associated with that table illustrated in the flowchart 250 in FIG. 12. If, for example, a column is inserted into the table, the EMI engine creates a variation vector, which includes one bit for every sub-column/sub-unit in the new column along with the offsets and lengths associated with the sub-units in the sub-unit data one or more descriptor encoding units (125-127) configured to encode said classified aligned reads as blocks of syntax elements by selecting said syntax elements according to said classes of aligned reads ([0012], The element may include an array having a plurality of values for each row in the database table, or the element may include a single value for each row in the database table. In some embodiments, the variation data structure may be a binary structure having a bit for each element of the column, where each bit within the structure indicates a variation for a corresponding element; [0036], A database engine may use the vector portion of the EVI to build a dynamic bitmap that contains one bit for each row in the table. If the row satisfies a query selection, the bit is set on. If the row does not satisfy the query selection, the bit is set off. Similar to a bitmap index, intermediate dynamic bitmaps can be AND'ed and OR'ed together to satisfy an ad hoc query).  

Bestgen fails to explicitly disclose one or more entropy encoding units (1110), configured to compress said blocks of syntax elements according to their statistical properties to produce Genomic Streams (1215) or a multiplexer (1216) for multiplexing the compressed genomic data and metadata.  

Gurulogic teaches wherein genomic data are entropy encoded and decoded to compress said blocks of syntax elements according to their statistical properties to produce Genomic data streams (page 1 lines 1-10, The present disclosure relates generally to data compression, and more specifically, to encoders for encoding input data (D1) to generate corresponding encoded data (E2), and decoders for decoding the encoded data (E2) to generate corresponding decoded data (d3).  Moreover, the present disclosure relates to methods of encoding input data (D1) to generate corresponding encoded data (E2), and methods of decoding the encoded data (E2) to generated corresponding decoded data (D3); page 3 lines 1-7, In a first respect, embodiments of the present disclosure provide an encoder including processing 
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Gurulogic for the purpose of compressing data in a genomic database as taught by Gurulogic.


Lee teaches a multiplexer (1113) for multiplexing and demultiplexing the compressed data and metadata ([0010], The present invention provides a method and apparatus in which, based on the length of each frame of a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and then multiplexed; [0011], The present invention also provides a method and apparatus, in which a stream generated by multiplexing bitstreams coded to have different frame lengths by a plurality of coders in a coding end is demultiplexed in order to detect the length of each frame of ; [0086], based on the length of each frame of a bitstream selected as a reference bitstream from among bitstreams coded to have different frame lengths by a plurality of coders, the remaining bitstreams are divided and multiplexed; [0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end. Moreover, the decoding end can demultiplex and decode the remaining bitstreams without information about the data size of each bitstream obtained by dividing the remaining bitstreams. Therefore, the coding end and the decoding end can accurately and efficiently control a bitrate).
It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Bestgen with the teachings of Lee for the purpose of reducing the size of data to be transmitted as taught by Lee ([0087], By doing so, it is not necessary for a coding end to generate and transmit information about the data size of each bitstream obtained by dividing the remaining bitstreams except for the reference bitstream, thereby reducing the complexity of and the time required for a coding process and reducing the size of data to be transmitted from the coding end to a decoding end).




Regarding claim 15, Bestgen discloses the genomic encoder of claim 14 further comprising coding means suitable for executing the coding ([0049], Database queries and creation of index structures, as the EMI above, are generally implemented in a database management system executing in a compute environment such as an individual machine or a client-server environment. FIG. 9 illustrates an exemplary hardware and software environment for an apparatus 120 suitable for database management, execution of queries and creation of Encoded Matrix Index entities consistent with the invention).






 
Conclusion and Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Robert Stevens, whose telephone number is (571) 272-4102.  The examiner can normally be reached on M-F 6:00 – 2:30.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on (571) 272-0631.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ROBERT STEVENS/Primary Examiner, Art Unit 2164                                                                                                                                                                                                        




February 25, 2021