EzDevInfo.com

Genome

A simple, type safe, failure driven mapping library for serializing JSON to models in Swift 2.0 (Supports Linux)

where do i download gene expression data?

i wanted to download gene expression data derived from generated by microarray experiments. i do not know too much about this subject, but as i understand, rows often correspond to genes and columns corresponds to samples. ideally, i expect a matrix of gene expression data.

i've been searching on the internet, and although it may seem like there are many places to download such data, when i actually do download the data, i do not get the matrix of gene expression. could someone please let me know if there is a place or how to download gene expression data in the format that i expect above?

any help is appreciated.


Source: (StackOverflow)

Can I use the Music Genome Project?

Is there an API or database I can access or is it a proprietary project?


Source: (StackOverflow)

Advertisements

Map SNP IDs to genome coordinates

I have several SNP IDs (i.e., rs16828074, rs17232800, etc...), I want to their coordinates in a Hg19 genome from UCSC genome website.

I would prefer using R to accomplish this goal. How to do that?


Source: (StackOverflow)

Python script to use coordinates from one file and add values from matching coordinates in another file

I have an original set of genomic coordinates (chrom, start, end) in a tab delimited bed file. I also have additional tab delimited bed files that contain some of the original genomic coordinates plus a numerical value associated with each of these coordinates. These coordinates can show up multiple times in a bed file with a different numerical value each time. I need a final bed file that contains each of the original genomic coordinates with the summed number of all the values found to be associated with that specific coordinate. Examples of files I'm working with are below.

Original File:

chr1    2100    2300

chr2    3300    3600

chr1    2560    2800

Other Bed file:

chr1    2100    2300    6

chr2    3300    3600    56

chr1    2100    2300    10

Needed Output file:

chr1    2100    2300    16

chr2    3300    3600    56

chr1    2560    2800    0

I need to write a python script to do this, but I'm not really sure what the best way to do it is.


Source: (StackOverflow)

python to change '|' into tab delimenated

I need to replace '|' into tab so that I can analyze my human annotation genomic data (200+mb). I'm a research assistant learning how to analyze/manipulate sequencing data in the easiest/simplest way so that I can replicate this on more data.

Here how my data looks like. There are ~400,000 lines of this type of data in one file.

       ANN=C|downstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|transcript|ENST00000606857|unprocessed_pseudogene||n.*1414T>C|||||1414|,C|intron_variant|MODIFIER|OR4G4P|ENSG00000268020|transcript|ENST00000594647|unprocessed_pseudogene|1/1|n.20-104T>C||||||;DP=11;SS=1;VT=SNP

I tried to use this code to replace '|' into '\t' for several lines.

import csv
infile = 'Book2.xlsx'
with open(infile , 'r') as inf: 
    for line in inf:    
        w =csv.writer(inf, delimiter = '\t')
        print w

All I'm getting is this :

<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>
<_csv.writer object at 0x7f8beebaafc8>

Source: (StackOverflow)

How to save Variant Call Format (VCF) file to disk in R using VariantAnnotation Package

I've searched the web for this without much luck. More or less you always get to the example from the VariantAnnotation Package. And since this example works fine on my computer I have no idea why the VCF I created does not.

The problem: I want to determine the number and location of SNPs in selected genes. I have a large VCF file (over 5GB) that has info on all SNPs on all chromosomes for several mice strains. Obviously my computer freezes if I try to do anything on the whole genome scale, so I first determined genomic locations of genes of interest on chromosome 1. I then used the VariantAnnotation Package to get only the data relating to my genes of interest out of the VCF file:

library(VariantAnnotation)
param<-ScanVcfParam(
  info=c("AC1","AF1","DP","DP4","INDEL","MDV","MQ","MSD","PV0","PV1","PV2","PV3","PV4","QD"), 
  geno=c("DP","GL","GQ","GT","PL","SP","FI"),
  samples=strain, 
  fixed="FILTER",
  which=gnrng
  )

The code above is taken out of a function I wrote which takes strain as an argument. gnrng refers to a GRanges object containing genomic locations of my genes of interest.

vcf<-readVcf(file, "mm10",param)

This works fine and I get my vcf (dim: 21783 1) but when I try to save it won't work

file.vcf<-tempfile()
writeVcf(vcf, file.vcf)
Error in .pasteCollapse(ALT, ",") : 'x' must be a CharacterList

I even tried in parallel, doing the example from the package first and then substituting for my VCF file:

#This is the example:
out1.vcf<-tempfile()
in1<-readVcf(fl,"hg19")
writeVcf(in1,out1.vcf)

This works just fine, but if I only substitute in1 for my vcf I get the same error.

I hope I made myself clear... And any help will be greatly appreciated!! Thanks in advance!


Source: (StackOverflow)

How to compare two files quickly?

I need to be able to compare the two coordinates (the 2nd and 3rd word in a line) to see where they overlap. Now, my code does it, but it does it very slow. So far for a file with 10000 lines my code takes about two minutes. I need to use it for a file with 3 billion lines, which I estimate will take forever. Is there a way to refactor my code to be so much faster?

So far I can do exactly what I want. Which is this:

import os.path
with open("Output.txt", "w") as result:
  with open("bedgraph2.txt") as file1:
    for f1_line in file1:
      segment_1 = f1_line.split()
      with open("bedgraph1.txt") as file2:
        for f2_line in file2:
          segment_2 = f2_line.split()
          if (int(segment_1[2]) > int(segment_2[1])) & (int(segment_1[1]) < int(segment_2[2])):
            with open("Output.txt", "a") as add:
              add.write(segment_1[0])
              add.write(" ")
              add.write(segment_1[1])
              add.write(" ")
              add.write(segment_1[2])
              add.write(" ")
              add.write(segment_1[3])
              add.write(" | ")
              add.write(segment_2[0])
              add.write(" ")
              add.write(segment_2[1])
              add.write(" ")
              add.write(segment_2[2])
              add.write(" ")
              add.write(segment_2[3])
              add.write("\n")
            break

print "done"

This is a sample of the data

bedgraph2.txt
chr01   1780    1795    -0.811494
chr01   1795    1809    -1.622988
chr01   1809    1829    -2.434482
chr01   1829    1830    -3.245976
chr01   1830    1845    -2.434482
chr01   1845    1859    -1.622988
chr01   1859    1879    -0.811494
chr01   1934    1984    -0.811494
chr01   3550    3600    -0.811494
chr01   3790    3840    -0.811494
chr01   3882    3902    -0.811494
chr01   3902    3932    -1.622988


bedgraph1.txt
chr01   1809    1859    -1.139687
chr01   1965    2015    -1.139687
chr01   3790    3840    -1.139687
chr01   3930    3942    -1.139687
chr01   3942    3980    -2.279375
chr01   3980    3992    -1.139687
chr01   4260    4310    -1.139687
chr01   4361    4382    -1.139687
chr01   4382    4411    -2.279375
chr01   4411    4432    -1.139687
chr01   4473    4523    -1.139687
chr01   4605    4655    -1.139687

Thanks in advance


Source: (StackOverflow)

Combining tables from different numbers of rows with a master MAP table

This dataset represents a genome map positions (chr and start) with the sum of the sequencing coverage (depth) of each position for 20 individuals (dat)

Example:

gbsgre <- "chr start end depth
chr1 3273 3273 7
chr1 3274 3274 3
chr1 3275 3275 8
chr1 3276 3276 4
chr1 3277 3277 25"
gbsgre <- read.table(text=gbsgre, header=T)

This dataset represents a genome map positions (V1 plus V2) with individual coverage (V3) for each position.

Example:

df1 <- "chr start depth
        chr1 3273 4
        chr1 3276 4
        chr1 3277 15"
df1 <- read.table(text=df1, header=T)

df2 <- "chr start depth
        chr1 3273 3
        chr1 3274 3
        chr1 3275 8
        chr1 3277 10"

df2 <- read.table(text=df2, header=T)

dat <- NULL

dat[[1]] <- df1
dat[[2]] <- df2

> dat
[[1]]
   chr start depth
1 chr1  3273     4
2 chr1  3276     4
3 chr1  3277    15

[[2]]
   chr start depth
1 chr1  3273     3
2 chr1  3274     3
3 chr1  3275     8
4 chr1  3277    10

According to the chr and start position on gbsgre, I need to cross all the 20 depths (V3) of each 20 animals ([[1]] to [[20]]) to the main table (gbsgre) to generate a final table as follows: The first column will be the chromosome position (V1), second column (V2) will be the start position, third will be the depth (V3) of the “gbsgre” dataset, the fourth (V4) will be the depth (dat/V3) of the [[1]] from “dat”, and so on, until the twenty-fourth column, which will be the depth of the [[20]] on the “dat” dataset. But a very important thing is that, missing data on the 20 individuals should be considered like zero (“0”). And the number of final table should be the same of “gbsgre”.

#Example Result
> GBSMeDIP
chr start   depth   depth1  depth2
1: chr1 3273    7   4   3
2: chr1 3274    3   0   3 
3: chr1 3275    8   0   8 
4: chr1 3276    4   4   0 
5: chr1 3277    25  15  10

Source: (StackOverflow)

a fast way to get human genome sequence by coordinate

I want to get a lot human genome fragments (more than 500 million of them) randomly.

This is a partial work of the whole process. I have .sam result file from bowtie, with 10 million human genome reads alignment. I want to compare each query reads with the 'reference sequence it aligned to' from the sam file. The reference sequence I used is hg19.fa from UCSC. So I need to be able to get the sequence from hg19.fa (or chromosome files) by using the location in the sam file.

e.g. with giving: chr4:35654-35695, i could get 42bp sequences:

gtcttccagggtttttatatttttgggttttacacttaagt

so far, i had 2 solutions: 1. python script to fetch sequences from UCSC DAS server: http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr4:35654,35695

  1. using python script call ''samtools faidx'' command and return commnad output, from post: http://seqanswers.com/forums/showthread.php?t=23606&highlight=fetch+genome+coordinate

but, they are slow. samtools faidx is bit faster than getting it from DAS server, but still slow.

so, is there any FAST way to do this? i have the seprate chromosome fasta files, and hg19.fa file.


Source: (StackOverflow)

How to determine characteristics for a genome?

In AI, are there any simple and/or very visual examples of how one could implement a genome into a simulation?

Basically, I'm after a simple walkthrough (not a tutorial, but rather something of a summarizing nature) which details how to implement a genome which changes the characteristics in an 'individual' in a sumlation.

These genes would not be things like:

  • Mass
  • Strength
  • Length,
  • Etc..

But rather they should be the things defining the above things, abstracting the genome from the actual characteristics of the inhabitants of the simulation.

Am I clear enough on what I want?

Anyway, if there's any way that you have tried that's better, and that implements evolution in a form like these sexual swimmers, then by all means, go ahead and post it! The more fun inspiration the better :)


Source: (StackOverflow)

Python Regex to Extract Genome Sequence

I’m trying to use a Python Regular Expression to extract a genome sequence from a genome database; I’ve pasted a snippet of the database below.

>GSVIVT01031739001 pacid=17837850 polypeptide=GSVIVT01031739001 locus=GSVIVG01031739001 ID=GSVIVT01031739001.Genoscope12X annot-version=Genoscope.12X ATGAAAACGGAACTCTTTCTAGGTCATTTCCTCTTCAAACAAGAAAGAAGTAAAAGTTGCATACCAAATATGGACTCGAT TTGGAGTCGTAGTGCCCTGTCCACAGCTTCGGACTTCCTCACTGCAATCTACTTCGCCTTCATCTTCATCGTCGCCAGGT TTTTCTTGGACAGATTCATCTATCGAAGGTTGGCCATCTGGTTATTGAGCAAGGGAGCTGTTCCATTGAAGAAAAATGAT GCTACACTGGGAAAAATTGTAAAATGTTCGGAGTCTTTGTGGAAACTAACATACTATGCAACTGTTGAAGCATTCATTCT TGCTATTTCCTACCAAGAGCCATGGTTTAGAGATTCAAAGCAGTACTTTAGAGGGTGGCCAAATCAAGAGTTGACGCTTC CCCTCAAGCTTTTCTACATGTGCCAATGTGGGTTCTACATCTACAGCATTGCTGCCCTTCTTACATGGGAAACTCGCAGG AGGGATTTCTCTGTGATGATGTCTCATCATGTAGTCACTGTTATCCTAATTGGGTACTCATACATATCAAGTTTTGTCCG GATCGGCTCAGTTGTCCTTGCCCTGCACGATGCAAGTGATGTCTTCATGGAAGCTGCAAAAGTTTTTAAATATTCTGAGA AGGAGCTTGCAGCAAGTGTGTGCTTTGGATTTTTTGCCATCTCATGGCTTGTCCTACGGTTAATATTCTTTCCCTTTTGG GTTATCAGTGCATCAAGCTATGATATGCAAAATTGCATGAATCTATCGGAGGCCTATCCCATGTTGCTATACTATGTTTT CAATACAATGCTCTTGACACTACTTGTGTTCCATATATACTGGTGGATTCTTATATGCTCAATGATTATGAGACAGCTGA AAAATAGAGGACAAGTTGGAGAAGATATAAGATCTGATTCAGAGGACGATGAATAG
>GSVIVT01031740001 pacid=17837851 polypeptide=GSVIVT01031740001 locus=GSVIVG01031740001 ID=GSVIVT01031740001.Genoscope12X annot-version=Genoscope.12X ATGGGTATTACTACTTCCCTCTCATATCTTTTATTCTTCAACATCATCCTCCCAACCTTAACGGCTTCTCCAATACTGTT TCAGGGGTTCAATTGGGAATCATCCAAAAAGCAAGGAGGGTGGTACAACTTCCTCATCAACTCCATTCCTGAACTATCTG CCTCTGGAATCACTCATGTTTGGCTTCCTCCACCCTCTCAGTCTGCTGCATCTGAAGGGTACCTGCCAGGAAGGCTTTAT GATCTCAATGCATCCCACTATGGTACCCAATATGAACTAAAAGCATTGATAAAGGCATTTCGCAGCAATGGGATCCAGTG CATAGCAGACATAGTTATAAACCACAGGACTGCTGAGAAGAAAGATTCAAGAGGAATATGGGCCATCTTTGAAGGAGGAA CCCCAGATGATCGCCTTGACTGGGGTCCATCTTTTATCTGCAGTGATGACACTCTTTTTTCTGATGGCACAGGAAATCCT GATACTGGAGCAGGCTTCGATCCTGCTCCAGACATTGATCATGTAAACCCCCGGGTCCAGCGAGAGCTATCAGATTGGAT GAATTGGTTAAAGATTGAAATAGGCTTTGCTGGATGGCGATTCGATTTTGCTAGAGGATACTCCCCAGATTTTACCAAGT TGTATATGGAAAACACTTCGCCAAACTTTGCAGTAGGGGAAATATGGAATTCTCTTTCTTATGGAAATGACAGTAAGCCA AACTACAACCAAGATGCTCATCGGCGTGAGCTTGTGGACTGGGTGAAAGCTGCTGGAGGAGCAGTGACTGCATTTGATTT TACAACCAAAGGGATACTCCAAGCTGCAGTGGAAGGGGAATTGTGGAGGCTGAAGGACTCAAATGGAGGGCCTCCAGGAA TGATTGGCTTAATGCCTGAAAATGCTGTGACTTTCATAGATAATCATGACACAGGTTCTACACAAAAAATTTGGCCATTC CCATCAGACAAAGTCATGCAGGGATATGTTTATATCCTCACTCATCCTGGGATTCCATCCATATTCTATGACCACTTCTT TGACTGGGGTCTGAAGGAGGAGATTTCTAAGCTGATCAGTATCAGGACCAGGAACGGGATCAAACCCAACAGTGTGGTGC GTATTCTGGCATCTGACCCAGATCTTTATGTAGCTGCCATAGATGAGAAAATCATTGCTAAGATTGGACCAAGGTATGAT GTTGGGAACCTTGTACCTTCAACCTTCAAACTTGCCACCTCTGGCAACAATTATGCTGTGTGGGAGAAACAGTAA
>GSVIVT01031741001 pacid=17837852 polypeptide=GSVIVT01031741001 locus=GSVIVG01031741001 ID=GSVIVT01031741001.Genoscope12X annot-version=Genoscope.12X ATGTCCAAATTAACTTATTTATTATCTCGGTACATGCCAGGAAGGCTTTATGATCTGAATGCATCCAAATATGGCACCCA AGATGAACTGAAAACACTGATAAAGGTGTTTCACAGCAAGGGGGTCCAGTGCATAGCAGACATAGTTATAAACCACAGAA CTGCAGAGAAGCAAGACGCAAGAGGAATATGGCCATCTTTGAAGGAGGAACCCCAGATGATCGCCTTGACTGGACCCCAT CTTTCCTTTGCAAGGACGACACTCCTTATTCCGACGGCACCGGAAACCCTGATTCTGGAGATGACTACAGTGCCGCACCA GACATCGACCACATCAACCCACGGGTTCAGCAAGAGCTAA

What I’m trying to do is get the genome (ACGT) sequence for GSVIV01031740001 (the middle sequence), and none of the others. My current regex is

sequence = re.compile('(?<=>GSVIVT01031740001) pacid=.*annot-version=.*\n[ACGT\n]*[^(?<!>GSVIVT01031740001) pacid]’)

with my logic being find the header with the genbank ID for the correct organism, give me that line, then go to a new line and give me all ACGT and new lines until I get to a header for an organism with a different genbank ID. This fails to give any results.

Yes, I know that re.compile doesn’t actually perform a search; I’m searching against a file opened as ‘target’ so my execution looks like

>>> for nucl in target:
...     if re.search(sequence, nucl):
...         print(nucl)

Can someone tell me what I’m doing wrong, either in my regex or by using regex in the first place? When I try this on regex101.com, it works, but when I try it in the Python interpreter (2.7.1), it fails.

Thanks!


Source: (StackOverflow)

Conversion of Bowtie and Sam format alignment files

I need both my alignment files to be in both bowtie and samtools format so that I can feed them into different programs later on in my pipeline. Is there any method I can use to convert a sam alignment file into a bowtie alignment file and vice versa?

An alternative would be to do the alignment twice and get the bowtie program to output it in different formats in each case. However, this wastes too much time.


Source: (StackOverflow)

Merge specific rows from two files if number in row file 1 is between two numbers in row in file 2

I'm searching for a couple of hours (actually already two days) but I can't find an answer to my problem yet. I've tried Sed and Awk but I can't get the parameters right.

Essentially, this is what I'm looking for

FOR every line in file_1
IF [value in colum2 in file_1]
   IS EQUAL TO [value in column 4 in some row in file_2]
   OR IS EQUAL TO [value in column 5 in some row in file_2]
   OR IS BETWEEN [value column 4 and value column 5 in some row in file_2]
THAN
    ADD column 3, 6 and 7 of some row of file_2 to column 3, 4 and 5 of file_1

NB: Values that needs to be compared are INTs, values in col 3, 6 and 7 (that only needs to be copied) are STRINGs

And this is the context, but probably not necessary to read:


I've two files with genome data which I want to merge in a specific way (the columns are tab separated)

  • The first file contains variants (only SNPs for the ones interested) of which, efficiently, only the second column is relevant. This column is a list of numbers (position of that variant on the chromosome)
  • I have a structural annotation files that contains the following data:
    • In column 4 is a begin position of the specific structure and in column 5 is the end position.
    • Column 3, 7 and 9 contains information that describes the specific structure (name of a gene etc.)

I would like to annotate the variants in the first file with the data in the annotation file. Therefore, if the number in column 2 of the variants file is equal to column 4 or 5 OR between those values in a specific row, columns 3, 7 and 9 of that specific row in the annotation needs to be added.


Sample File 1

SOME_NON_RELEVANT_STRING    142
SOME_NON_RELEVANT_STRING    182
SOME_NON_RELEVANT_STRING    320
SOME_NON_RELEVANT_STRING    321
SOME_NON_RELEVANT_STRING    322
SOME_NON_RELEVANT_STRING    471
SOME_NON_RELEVANT_STRING    488
SOME_NON_RELEVANT_STRING    497
SOME_NON_RELEVANT_STRING    541
SOME_NON_RELEVANT_STRING    545
SOME_NON_RELEVANT_STRING    548
SOME_NON_RELEVANT_STRING    4105
SOME_NON_RELEVANT_STRING    15879
SOME_NON_RELEVANT_STRING    26534
SOME_NON_RELEVANT_STRING    30000
SOME_NON_RELEVANT_STRING    30001
SOME_NON_RELEVANT_STRING    40001
SOME_NON_RELEVANT_STRING    44752
SOME_NON_RELEVANT_STRING    50587
SOME_NON_RELEVANT_STRING    87512
SOME_NON_RELEVANT_STRING    96541
SOME_NON_RELEVANT_STRING    99541
SOME_NON_RELEVANT_STRING    99871

Sample File 2

SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A1  0   38  B1  C1
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A2  40  2100    B2  C2
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A3  2101    9999    B3  C3
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A4  10000   15000   B4  C4
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A5  15001   30000   B5  C5
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A6  30001   40000   B6  C6
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A7  40001   50001   B7  C7
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A8  50001   50587   B8  C8
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A9  50588   83054   B9  C9
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A10 83055   98421   B10 C10
SOME_NON_RELEVANT_STRING    SOME_NON_RELEVANT_STRING    A11 98422   99999   B11 C11

Sample output file

142 A2  B2  C2
182 A2  B2  C2
320 A2  B2  C2
321 A2  B2  C2
322 A2  B2  C2
471 A2  B2  C2
488 A2  B2  C2
497 A2  B2  C2
541 A2  B2  C2
545 A2  B2  C2
548 A2  B2  C2
4105    A3  B3  C3
15879   A5  B5  C5
26534   A5  B5  C5
30000   A5  B5  C5
30001   A6  B6  C6
40001   A7  B7  C7
44752   A7  B7  C7
50587   A8  B8  C8
87512   A10 B10 C10
96541   A10 B10 C10
99541   A11 B11 C11
99871   A11 B11 C1

1


Source: (StackOverflow)

find the same part in a string

I have a string such as : abcgdfabc

I want to do like following: input: a string, e.g.:

abcgdfabc

output: a dict (key is the "words",and value is the time it shows up),

abc:2
gdf:1

words is the maxmium lenght of "words", it should be greedy match.

I have spent a lot time on it, and can't figure out. The string is longer than 5000, it's a genome, we want to find out the relationship of it, the first time we have to find such a dict to make data more clear, help.


Source: (StackOverflow)

Equal genomic intervals between samples

I would like to found the exactly same genomic intervals shared between samples (NE_id).

My Input:

chr  start_call   end_call  NE_id 
chr1    150         200      NE01
chr1    150         200      NE02
chr2    100         150      NE01
chr2    100         160      NE02
chr3    200         300      NE01   
chr3    200         300      NE02

My expected output:

chr  start_call   end_call  NE_id 
chr1    150         200      NE01, NE02   
chr3    200         300      NE01, NE02

In this example the chr2 genomic interval have some overlap, however it don´t correspond to the exact same genomic interval (size difference == 10).

Thank you very much.


Source: (StackOverflow)