DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-7 are canceled. Claims 8-22 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 2/28/2022 was filed before First Office Action.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Non-Patent Documents citation number 1-2 where not provide.
 Citations 3-7 were provided in application 15/977646.
Drawings
The drawings filed on 8/16/2021 are acceptable for examination purposes.

Claim Objections
Claim 9 is objected to because of the following: Claim 9 depends for itself. Claim 9 should depend from independent claim 8  Appropriate correction is required.



Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
The following is a correspondence  between  instant application 17/406188 claims and U.S. 11, 094,397 claims.

17/403188
U.S. 11,094,397 (15/977646)
8. A system for encoding nucleic acid sequence data on a storage medium, the system comprising one or more processors configured to:
receive data representing a nucleic acid sequence;
divide the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence;
encode the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure is based at least in part on available storage resources on the storage medium; and
store the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium.

1. A system for securely communicating genomic information from a secure computing environment to an unsecure computing environment, the system comprising: one or more hardware processors located in the secure computing environment;
a memory located in the secure computing environment, the memory storing one or more programs, the one or more programs configured to be executed by the one or more hardware processors and including instructions to:
receive data representing a nucleic acid sequence;
receive data indicating a security level associated with the received data representing a nucleic acid sequence;
divide the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence;
store data, in the secure computing environment, representing one or more of the plurality of portions;
encode the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure comprises setting a pre-defined false-positive probability of the probabilistic data structure, and wherein the false-positive probability is set at least in part based on the indicated security level of the data representing the nucleic acid sequence;
transmit the encoded nucleic acid sequence, including the probabilistic data structure, for storage in the unsecure computing environment, in association with metadata corresponding to the data representing the nucleic acid sequence.

9. The system of claim 9, wherein the probabilistic data structure is configured to be queried by an element, and to responsively generate data indicating whether the element is a member of the set.

2. The system of claim 1, wherein the probabilistic data structure is configured to be queried by an element, and to responsively generate data indicating whether the element is a member of the set.
10. The system of claim 9, wherein generating data indicating whether the element is a member of the set comprises one of generating data indicating that the element is definitely not a member of the set and generating data indicating that the element is probably a member of the set.

3. The system of claim 1, wherein generating data indicating whether the element is a member of the set comprises one of generating data indicating that the element is definitely not a member of the set and generating data indicating that the element is probably a member of the set. 

11. The system of claim 9, wherein generating the probabilistic data structure comprises selecting a false-positive probability of the probabilistic data structure based at least in part on the available storage resources.

Claim 1: see above and claim 5: The system of claim 1, wherein the predefined false-positive probability is set at least in part in accordance with available storage resources.

21. A non-transitory computer-readable storage medium storing instructions for encoding nucleic acid sequence data on a storage medium, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
receive data representing a nucleic acid sequence;
divide the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence;
encode the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure
that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure is based at least in part on available storage resources on the storage medium; and
store the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium.

10. A non-transitory computer-readable storage medium storing one or more programs for securely communicating genomic information from a secure computing environment to an unsecure computing environment, the one or more programs configured to be executed by one or more processors located in the secure computing environment, the one or more programs including instructions to: receive data representing a nucleic acid sequence;
 receive data indicating a security level associated with the received data representing a nucleic acid sequence; divide the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence; 
store data, in the secure computing environment, representing one or more of the plurality of portions; 
encode the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure comprises setting a pre-defined false-positive probability of the probabilistic data structure, and wherein the false-positive probability is set at least in part based on the indicated security level of the data representing the nucleic acid sequence;
transmit the encoded nucleic acid sequence, including the probabilistic data structure, for
storage in the unsecure computing environment, in association with metadata corresponding to the data representing the nucleic acid sequence.

22. A method for encoding nucleic acid sequence data on a storage medium, the method performed by a system comprising one or more processors, the method comprising:
receiving data representing a nucleic acid sequence;
dividing the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence;
encoding the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure is based at least in part on available storage resources on the storage medium; and
storing the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium.

9. A method for storing one or more programs for securely communicating genomic information from a secure computing environment to an unsecure computing environment, the method comprising: at a system comprising one or more processors located in the secure computing
environment and a memory located in the secure computing environment:
receiving data representing a nucleic acid sequence;
receiving data indicating a security level associated with the received data representing a nucleic acid sequence;
dividing the data into a plurality of portions, wherein each of the plurality of portions represents a sub-string of the nucleic acid sequence;
 storing data, in the secure computing environment, representing one or more of the plurality of portions; 
encoding the data representing the nucleic acid sequence by generating, based on the data representing one or more of the plurality of portions, data including a probabilistic data structure that represents each of the one or more of the plurality of portions as members of a set, wherein generating the probabilistic data structure comprises setting a pre-defined false-positive probability of the probabilistic data structure, and wherein the false-positive probability is set at least in part based on the indicated security level of the data representing the nucleic acid sequence; transmitting the encoded nucleic acid sequence, including the probabilistic data structure, for storage in the unsecure computing environment, in association with metadata corresponding to the data representing the nucleic acid sequence.




Claims 8-11, 21-22 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3,5,9-10 of U.S. Patent No. 11,094,397. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1-3,5,9-10 of U.S. Patent No. 11,094,397 contain(s) every element of  claims 8-11, 21-22 of the instant application and as such anticipates claims 8-11, 21-22 of the instant application.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 8-20 are rejected under 35 U.S.C. 101 because the claimed
invention is directed to non-statutory subject matter. The claim(s) does/do not fall within
at least one of the four categories of patent eligible subject matter because claim 1
recites “a system for encoding nucleic sequence data on a storage medium, the system comprising one or more processors configured to: ….”. In [0052], line 8-9,
“…The one or more processors in processor 102 may implement virtual machine
technologies…” and in [0051], line 6-7, as storage medium as both a non-transitory or transitory storage medium.  Given broadest reasonable interpretation, one or more processor recited in claim 8 is virtual processors and the storage medium as a transitory medium recited in claim 8. Claim 8 recites a system without any hardware elements. The claims lack the necessary physical articles or objects to constitute a machine or manufacture within the meaning of 35 USC 101. They are clearly not a series of steps or acts to be a process nor are they a combination of chemical compounds to be a composition of matter. As such, they fail to fall within a statutory category. They are, at best, functional descriptive material.
Claim 8 is rejected under 35 USC 101.
Dependent claims are rejected because they fail to cure deficiencies.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.




Claim(s) 8-11, 13-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub 2017/0068776 A1 issued to Ricardo Godinez-Moreno (“Godinez-Moreno”) and U.S. Pub 2014/0274752 A1 issued to Erich D. Blume (“Blume”) and view of US 2015/0220684 issued to Nicholas Boyd Greenfield (“Greenfield”).
As per claim 8,  (Godinez-Moreno  teaches A system for encoding nucleic acid 
sequence data on a storage medium, the system comprising one or more processors 
(Godinez-Moreno: (Godinez-Moreno: Abstract, [0028], line 1-2, “…FIG. 1 illustrates a 
data-processing system 100…”) , line 16-17, “… dedicated single or multi-core 
processor…”) configured to:
receive data (read data) representing a nucleic acid sequence (Godinez-Moreno:
[0058], line 2-4, “… obtain read data at process block 208. Read data can be provided
via a sequencing instrument 106…” [0029], line 10-12, “… Read data can be a subset of
the sequence of base nucleotides comprising an organism's DNA…”);

divide the data into a plurality of portions  (a set of encoded k-mers), wherein each 
of  the plurality of portions represents a sub-string of the nucleic acid sequence 
(Godinez-Moreno: [0062], line 1-7,”… At process block 406 the read data can be
encoded. In one embodiment, the read data can be encoded by applying a mask to the
read data. For example, the same mask used to encode the reference genome can be
applied to the read data. The application of a mask to the read data can produce a set
of encoded k-mers that correspond to the data in the loaded read data…” <examiner
note: the read data is divided in to k-mers (substrings of read data) with length k>);
store data, in the secure computing environment, representing one or more of the
plurality of portions (Godinez-Moreno: figure 3 and [0049], data is stored);

encode the data representing the nucleic acid sequence by generating, based on
the data representing one or more of the plurality of portions … (Godinez-Moreno:
[0062], line 1-7,”… At process block 406 the read data can be encoded. In one
embodiment, the read data can be encoded by applying a mask to the read data. For
example, the same mask used to encode the reference genome can be applied to the
read data. The application of a mask to the read data can produce a set of encoded k-
mers that correspond to the data in the loaded read data…” <examiner note: the read
data is divided in to k-mers (substrings of read data) with length k>);

store the encoded nucleic acid sequence  …. on the storage medium (Godinez-
Morino: figure 3 and [0049], as data is stored).
Godinez-Moreno does not explicitly teach “… data including a probabilistic
data structure that represents each of the one or more of the plurality of portions
as members of a set , wherein generating the probabilistic data structure is based 
at least in part on available storage resources on the storage medium” and store 
the encoded nucleic acid sequence, including the probabilistic data structure, on 
the storage medium. 

Blume teaches “… , data including a probabilistic data structure (bloom filters) that represents each of the one or more of the plurality of portions as members of a set (chromosome) , wherein generating the probabilistic data structure is based at least in part on available storage resources on the storage medium” (Blume:[0090], line 3-5, “… multiple Bloom filters are produced, one for each location of interest in the reference sequence (e.g., one for each chromosome in a genome)…” <examiner note: multiple bloom filters are for chromosomes/organism>), probabilistic data structure that represents … members of a set (<examiner note: A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set>), wherein each of the plurality of elements(bit 1) corresponds to a nucleic acid sub-string (slices 103a-103c) of the genomic reference data of the respective organism (chromosome reference sequence 101)(fig. 1a, <examiner note: for instance, slice 103a (i.e., a substring of reference 101) is hashed to populate bit 1s (elements) in bit array at positions in indicated by hash values>).
It would have been obvious to one of ordinary skill in the art before the effective filing date to encode data into bloom filters (e.g., probabilistic data structures) as disclosed by Blume into Godinez-Moreno because [0063] There are certain advantages to using Bloom filters for aligning reads to reference sequences. For example, Bloom filters do not provide false negatives, and to the extent that they provide false positives, these do so at a pre-set level dictated by the design of the Bloom filter. Therefore, the filter can be constructed to meet a false positive rate that is acceptable for a given application. [0064] Further, Bloom filters are able to very rapidly test whether a read aligns in a reference sequence or portion that reference sequence. In certain embodiments, a Bloom filter requires about 10 or fewer memory accesses to align a 36 base pair tag. In some cases, the filter requires only 9 or fewer memory accesses for such alignment. In many conventional computer systems this translates total align time of a fraction of a millisecond or less per read. 
Godinez-Moreno and Blume do not explicitly teach “store the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium”. Greenfield does teach this limitation at (Greenfield: [0087], line 1-5, “…A lookup operation will lookup the end value associated with a key. In the biological sequence information application, a biological sequence k-mer (i.e., a fragment or section of a biological sequence) can be queried and a characterization returned...”[0032], line 14-17, “… A B-field lookup process can be performed for a set of fragments in the full or partial DNA sequence data to obtain a set of characterizations. From the set of characterizations, a characterization report can be generated, such as the characterization reports shown in FIGS. 5A-5D. Alternative use cases may use any suitable type of associated value for the characterization…” <examiner note: Bacillus cereus associated with the B-field data structure. This  set of characterization (e.g., Bacillus cereus) is stored is a characterization report (i.e., data structure).
It would have been obvious to one of ordinary skill in the art before the effective filing date to include and encoding data (i.e., biological characterization) into B-field data structure (i.e., probabilistic data structure) as disclosed by Greenfield into Godinez-Moreno and Blume because the B-field data structure can probabilistically store key-value pairs in a space-efficient manner for in-memory use. For many common use-cases or configurations, the B-field data structure can store billions of elements using only tens of gigabytes of storage (or a few bytes per key-value pair). Such space requirements can scale linearly with the number of elements in the B-field, n. Stated in an alternative manner, the B-field data structure has O(n) space complexity.
As per claim 9, same as claim arguments above and Blume teaches:
 The system of claim 9, wherein the probabilistic data structure (bloom filter) is configured to be queried by an element, and to responsively generate data indicating whether the element is a member of the set (Blume: [0111], line 1-9, “… The reads are provided to a membership tester 303… Tester 303 includes the logic necessary for testing each of the reads from sequencer 301 against the Bloom filters in each of the Chromosome Membership Objects…” <examiner note: each read of nucleic acid sample is tested against bloom filters>).

As per claim 10, same as claim arguments above and Blume teaches:
wherein generating data indicating whether the element is a member of the set
comprises one of generating data indicating that the element is definitely not a member
of the set and generating data indicating that the element is probably a member of
the set (Blume: figure 2b).

As per claim 11, same as claim arguments above and Greenfield teaches:
wherein generating the probabilistic data structure comprises 
selecting a false-positive probability of the probabilistic data structure based at 
least in part on the available storage resources (Greenfield: [0026],
“… A false positive error can be defined as the rate at which a data structure returns a
value (y) when the key (x) does not exist in the set of stored x values (S). When x does
not exist in S, the B-field data structure query operation should indicate x is out of
range, such as by returning a special value ⊥. B-field data structures exhibit false
positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be
arbitrarily small. Additionally, in the problem space of biological sequence queries, the
query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such
that false positive errors are not substantially detrimental to the overall objective of a
particular query because sequencing errors are present in the query…”<examiner
note: storage is associated with processor>).
	 

As per claim 13, same as claim arguments above and Greenfield teaches: 
wherein generating the probabilistic data structure is based at least in part on a target file size for the probabilistic data structure (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).

As per claim 14, same as claim arguments above and Greenfield teaches:
The system of claim 14, wherein generating the probabilistic data structure comprises selecting a false-positive probability of the probabilistic data structure based at least in part on the target file size (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).

As per claim 15, same as claim arguments above and Blume teaches:
wherein generating the probabilistic data structure is based at least in part on available processing resources of the system (Blume:[0090], line 3-5, “… multiple Bloom filters are produced, one for each location of interest in the reference sequence (e.g., one for each chromosome in a genome)…” (<examiner note: A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set>).

As per claim 16, same as claim arguments above and Greenfield teaches:
The system of claim 15, wherein generating the probabilistic data structure comprises selecting a false-positive probability of the probabilistic data structure based at least in part on the available processing resources of the system (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).

As per claim 17, same as claim arguments above and Blume teaches:
The system of claim 9, wherein generating the probabilistic data structure is based at 
least in part on requirements for accuracy of comparisons to be made against the 
probabilistic data structure (Blume:[0090],  line 3-5, “… multiple Bloom filters are produced, one for each location of interest in the reference sequence (e.g., one for each chromosome in a genome)…” <examiner note: A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set>).

As per claim 18, same as claim arguments above and Greenfield teaches:
The system of claim 17, wherein generating the probabilistic data structure comprises selecting a false-positive probability of the probabilistic data structure based at least in part on the requirements for accuracy of comparisons to be made against the probabilistic data structure (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).

As per claim 19, same as claim arguments above and Greenfield teaches:
wherein generating the probabilistic data structure is based at least in part on a level of sensitivity of the data representing the nucleic acid sequence . Greenfield does teach this limitation at (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).
As per claim 20, same as claim arguments above and Greenfield teaches:
The system of claim 19, wherein generating the probabilistic data structure comprises 
selecting a false-positive probability of the probabilistic data structure based at least 
in part on the level of sensitivity of the data representing the nucleic acid sequence (Greenfield:[0026], “… A false positive error can be defined as the rate at which a data structure returns a value (y) when the key (x) does not exist in the set of stored x values (S). When x does not exist in S, the B-field data structure query operation should indicate x is out of range, such as by returning a special value ⊥. B-field data structures exhibit false positives at an error rate of α. At the cost of lesser space efficiency, α can be set to be arbitrarily small. Additionally, in the problem space of biological sequence queries, the query input (e.g., a sequence of DNA reads) may have sequencing errors at a rate such that false positive errors are not substantially detrimental to the overall objective of a particular query because sequencing errors are present in the query…”).

Claim 21 is rejected based on the same rational as claim 8 above.

As per claim 22, Godinez-Moreno  teaches A method for encoding nucleic acid sequence data on a storage medium, the method performed by a system comprising one or more processors (Godinez-Moreno: (Godinez-Moreno: Abstract, [0028], line 1-2, “…FIG. 1 illustrates a data-processing system 100…”) , line 16-17, “… dedicated single or multi-core processor…”) , the method comprising:
receiving data (reading data) representing a nucleic acid sequence (Godinez-Moreno: [0058], line 2-4, “… obtain read data at process block 208. Read data can be provided via a sequencing instrument 106…” [0029], line 10-12, “… Read data can be a subset of the sequence of base nucleotides comprising an organism's DNA…”);
dividing the data into a plurality of portions  (a set of encoded k-mers), wherein each of  the plurality of portions represents a sub-string of the nucleic acid sequence (Godinez-Moreno: [0062], line 1-7,”… At process block 406 the read data can be coded. In one embodiment, the read data can be encoded by applying a mask to the read data. For example, the same mask used to encode the reference genome can be applied to the read data. The application of a mask to the read data can produce a set of encoded k-mers that correspond to the data in the loaded read data…” <examiner 
note: the read data is divided in to k-mers (substrings of read data) with length k>);
store data, in the secure computing environment, representing one or more of the
plurality of portions (Godinez-Moreno: figure 3 and [0049], data is stored);
encode the data representing the nucleic acid sequence by generating, based on
the data representing one or more of the plurality of portions … (Godinez-Moreno:
[0062], line 1-7,”… At process block 406 the read data can be encoded. In one
embodiment, the read data can be encoded by applying a mask to the read data. For
example, the same mask used to encode the reference genome can be applied to the
read data. The application of a mask to the read data can produce a set of encoded k-
mers that correspond to the data in the loaded read data…” <examiner note: the read
data is divided in to k-mers (substrings of read data) with length k>);
encoding the data representing the nucleic acid sequence by generating, based 
on the data representing one or more of the plurality of portions … (Godinez-
Moreno:[0062], line 1-7,”… At process block 406 the read data can be encoded. In one
embodiment, the read data can be encoded by applying a mask to the read data. For
example, the same mask used to encode the reference genome can be applied to the
read data. The application of a mask to the read data can produce a set of encoded k-
mers that correspond to the data in the loaded read data…” <examiner note: the read
data is divided in to k-mers (substrings of read data) with length k>);

storing the encoded nucleic acid sequence  …. on the storage medium (Godinez-
Morino: figure 3 and [0049], as data is stored).

Godinez-Moreno does not explicitly teach “… data including a probabilistic
data structure that represents each of the one or more of the plurality of portions
as members of a set , wherein generating the probabilistic data structure is based 
at least in part on available storage resources on the storage medium” and “storing the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium”. Blume teaches “… , data including a probabilistic data structure (bloom filters) that represents each of the one or more of the plurality of portions as members of a set (chromosome) , wherein generating the probabilistic data structure is based at least in part on available storage resources on the storage medium” ([0090], line 3-5, “… multiple Bloom filters are produced, one for each location of interest in the reference sequence (e.g., one for each chromosome in a genome)…” <examiner note: multiple bloom filters are for chromosomes/organism>), probabilistic data structure that represents … members of a set (<examiner note: A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set>), wherein each of the plurality of elements(bit 1) corresponds to a nucleic acid sub-string (slices 103a-103c) of the genomic reference data of the respective organism (chromosome reference sequence 101)(fig. 1a, <examiner note: for instance, slice 103a (i.e., a substring of reference 101) is hashed to populate bit 1s (elements) in bit array at positions in indicated by hash values>). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to encode data into bloom filters (e.g., probabilistic data structures) as disclosed by Blume into Godinez-Moreno because [0063] There are certain advantages to using Bloom filters for aligning reads to reference sequences. For example, Bloom filters do not provide false negatives, and to the extent that they provide false positives, these do so at a pre-set level dictated by the design of the Bloom filter. Therefore, the filter can be constructed to meet a false positive rate that is acceptable for a given application. [0064] Further, Bloom filters are able to very rapidly test whether a read aligns in a reference sequence or portion that reference sequence. In certain embodiments, a Bloom filter requires about 10 or fewer memory accesses to align a 36 base pair tag. In some cases, the filter requires only 9 or fewer memory accesses for such alignment. In many conventional computer systems this translates total align time of a fraction of a millisecond or less per read. 
Godinez-Moreno and Blume do not explicitly teach “storing the encoded nucleic acid sequence, including the probabilistic data structure, on the storage medium”. Green field does teach this limitation at (Greenfield: [0087], line 1-5, “…
A lookup operation will lookup the end value associated with a key. In the biological sequence information application, a biological sequence k-mer (i.e., a fragment or section of a biological sequence) can be queried and a characterization returned...” [0032], line 14-17, “… A B-field lookup process can be performed for a set of fragments in the full or partial DNA sequence data to obtain a set of characterizations. From the set of characterizations, a characterization report can be generated, such as the characterization reports shown in FIGS. 5A-5D. Alternative use cases may use any suitable type of associated value for the characterization…”) (Bacillus cereus is obvious is an organism associated with the metadata 347: Bacillus cereus associated with the B-field data structure. This set of characterization (e.g., Bacillus cereus) is stored is a characterization report (i.e., data structure).
It would have been obvious to one of ordinary skill in the art before the effective filing date to include and encoding data (i.e., biological characterization) into B-field data structure (i.e., probabilistic data structure) as disclosed by Greenfield into Godinez-Moreno and Blume because the B-field data structure can probabilistically store key-value pairs in a space-efficient manner for in-memory use. For many common use-cases or configurations, the B-field data structure can store billions of elements using only tens of gigabytes of storage (or a few bytes per key-value pair). Such space requirements can scale linearly with the number of elements in the B-field, n. Stated in an alternative manner, the B-field data structure has O(n) space complexity.

Allowable Subject Matter
Claim 12 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Contact Information

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUSAN F RAYYAN whose telephone number is (571)272-1675. The examiner can normally be reached Monday and Tuesday 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Beausoliel can be reached on 571-272-3645. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.F.R/           Examiner, Art Unit 2167                                                                                                                                                                                                        August 30, 2022

/ROBERT W BEAUSOLIEL JR/           Supervisory Patent Examiner, Art Unit 2167