Help with bioinformatics for next-generation sequencing

17 posts / 0 new
Last post
ryan_m
ryan_m's picture
Help with bioinformatics for next-generation sequencing

I came across this forum (SeqAnswers) which focuses on the bioinformatics for handling/using data from next-generation sequencing platforms. Of course, anyone with problems or questions in that area could post here and find more than a few people working in this area who could probably advise them!

Ryan

khan
khan's picture
I am new to SNP genotyping.

I am new to SNP genotyping. If someone could help me to understand it for example if you discover 13000--15000 SNPs by EST sequences through 454 sequencing then how you genotype them on a population of 50 individuals. If you need to design 13000--15000 primers and which would be the best method of genotyping (cost effective)?

Thank you for your time and effort

ryan_m
ryan_m's picture
Hi Khan.

Hi Khan.
Once you have your SNPs of interest, it is better to go to the highly parallel genotyping assays such as those provided by Illumina. There are many companies that will design and perform your arrays for you, see for example, this site.

Regards,

Ryan

khan
khan's picture
Thanks for your reply. I will

Thanks for your reply. I will appreciate if you help me understanding how it works. I assume i found three thousand SNPs in EST sequences. Now i have to design three thosand primers and use them in multiplex in the assays like in CMMT genotyping assays?
Thanks

ryan_m
ryan_m's picture
khan wrote:Thanks for your

khan wrote:

Thanks for your reply. I will appreciate if you help me understanding how it works. I assume i found three thousand SNPs in EST sequences. Now i have to design three thosand primers and use them in multiplex in the assays like in CMMT genotyping assays?
Thanks

That is the basic idea, yes. But many companies would design the probes for you, so probably all you would need to supply would be the positions of the polymorphisms and the two alleles.

JayM
JayM's picture
I have worked with 454 data

I have worked with 454 data for transcriptome analysis and SNPs, and working with that data is not a big problem. Now, the shift from 454 to solexa seems daunting (we recently acquired the sequencer), is there anyone out there who has an assembly software that they can recommend for me (for 454 I used Codoncode Aligner and it did what I wanted; problem is it has limitations on memory and RAM settings thus making it a problem for solexa).

I am looking for software that is robust enough to handle solexa data without having to stretch its capabilities too much.

zee
zee's picture
We have written some software

We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com

ryan_m
ryan_m's picture
zee wrote:We have written

zee wrote:

We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com

In my experience, novoalign can consume upwards of 15 gigabytes of RAM when mapping solexa reads to human, though you can change some parameters when creating your index to reduce this. For the people reading this thread, zee, could you let us know what the suggested lower RAM limit is for running novoalign with solexa reads against the human reference genome?

Thanks,

Ryan

sparks
sparks's picture
Hi Ryan,

Hi Ryan,
Using a 14-mer index with step of 3, the Human genome can be indexed in approx 6Gbyte of RAM and novoalign & novopaired will then run quite happily on a workstation with 8Gbyte of RAM. BY default novoindex will look at how much RAM a server has and then choose k-mer length and step size to give optimum performance on that server. If your server has 16Gb RAM that might mean building a 12Gbyte index, you can always specify k&s and have a 6GB index on a 16Gb server.
Up to a limit (4^k < genome length / s) larger k and smaller s will improve performance.
Colin
ryan_m wrote:

zee wrote:
We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com

In my experience, novoalign can consume upwards of 15 gigabytes of RAM when mapping solexa reads to human, though you can change some parameters when creating your index to reduce this. For the people reading this thread, zee, could you let us know what the suggested lower RAM limit is for running novoalign with solexa reads against the human reference genome?

Thanks,

Ryan

ryan_m
ryan_m's picture
Thanks for the details, Colin

Thanks for the details, Colin. And just to confirm, the parameters used when the index is created does not affect the quality of the results (i.e. result in some missed alignments), it just leads to increased runtime to complete the process, correct?

Thanks again.
Ryan

sparks
sparks's picture
Hi Ryan,

Hi Ryan,
You're right, the index k-mer length and step size really only affect runtime performance. It shouldn't affect the alignment location of a read.
Colin

G_nome
G_nome's picture
Hi Sparks

Hi Sparks
I have a dual quad-core MacPro (64-bit Xeon processors) with 16G of RAM. This should be able to run novoalign just fine on the human genome, but I am having trouble getting novoindex to run. It seems that novoindex 'thinks' it is on a 32-bit machine and is complaining about memory limitations. Here is a piece of the error output:

Error: Sequence Index cannot fit in available RAM
Error: RAM available: 2048Mb
Error: Minimum RAM req'd: 4027Mb

Is there some way to get around this? Have others out there had success running novoindex and/or novoalign on a Mac?

sparks
sparks's picture
Hi G_nome,

Hi G_nome,

The problem is in determining how much memory is available. Can you tell me what version you are using.

If you specify the k &s parameters then it shouldn't be a problem. On a 16GByte server and using human reference genome you could use either -k14 -s1 or -k15 -s2

Best Regards, Colin

G_nome
G_nome's picture
sparks wrote:Hi G_nome,

sparks wrote:

Hi G_nome,

The problem is in determining how much memory is available. Can you tell me what version you are using.

If you specify the k &s parameters then it shouldn't be a problem. On a 16GByte server and using human reference genome you could use either -k14 -s1 or -k15 -s2

Best Regards, Colin

Thank you for your fast reply, Colin. The novoindex version is 1.5. As per your suggestion, it seems to work OK with -k15 -s2.

Regards,
Sean

G_nome
G_nome's picture
Hi Again.

Hi Again.
I am not getting alignments as quickly as I would have expected from the rough benchmarks (and comparisons to Maq and Eland). I started 8 novopaired jobs (on an 8-cpu machine) a week ago (one lane of data for each job). Some jobs are 42-bp reads and some are 76-bp reads. Currently, each job has aligned between 1 and 5 million reads. Each lane has about 20 million reads (10 million pairs), so it is looking like I have many more weeks to wait. Am I doing something wrong? The only non-default option I am using is (-Q 30), hoping that would provide a speed-up by ignoring low quality alignments. By the way, this is on a MacPro with 16G of memory.

I appreciate you help.

Sean

sparks
sparks's picture
Hi Sean,

Hi Sean,

The -Q 30 option won't speed up anything as it is a post alignment filter on quality.
To speed up alignment on longer reads I suggest using the -t option. On longer paired end reads the calculated default threshold may allow up to 8 mismatches per end and this can result in slow performance for some reads.
For paired end you could try using -t 150 or -t180, restricting alignments to 5 or 6 mismatches at high quality base positions (and more at low quality positions)
You could also use the -l option to filter out any low quality reads. On a Human size geome we'll normally try and align any read that has at least ~20 good bases (actually 40 bits of information using shannon's entropy and base qualities). For long reads this is probably a bit low, try -l 30 on the 40bp reads and -l 50 on the 76bp reads. This will reject a few more reads as low quality.
Please let me know if you have any success.

Thanks, Colin

lizz
lizz's picture
hello this is lizz can you

hello this is lizz can you tell me how to change the theme in window 7