Updating annotations in light of new data
Additional data can improve an existing annotation. mRNA-Seq data from additional tissues, developmental time points, and experimental conditions can provide evidence support for genes that are expressed in specific tissues and/or specific developmental time points and/or under certain experimental conditions. All genome annotation projects--that I am aware of--start before mRNA-Seq data for every combination of tissues, developmental time points, and experimental conditions are available. As work continues on a given organism more and more data of this type becomes available and updating the original annotation becomes desirable. You can us MAKER to do this!
This is what it looks like in the maker_opts.ctl file to update a subset of an annotation set in the light of new data. bold
genome=yourgenome.fasta
Make sure that the assembly has not changed since the original annotation
est=newdata.fasta #new data (denovo assembled transcript assembly)
This is where you put your new data if it is in fasta format. it will be aligned to the genome with blastn
est_gff=newdata.gff # new data in gff3 format (alignment based transcript assembly).
This is where you put your new data if it is in gff3 format. Make sure that the coordinates of the features in this file match the genome that was originally annotated. if the underlying genomic sequence has changed the coordinates will be wrong and could really mess up your gene models.
model_org=all #use the same repeat masking options as the original annotation rmlib=custom_lib.fasta #use the same repeat masking options as the original annotation repeat_protein=te_proteins.fasta #use the same repeat masking options as the original annotation
keeping the same repeat masking parameters is important if you are aligning your new data to the assembly est= option. if you are using est_gff= you don't even have to repeat mask.
pred_gff=annotations_to_update.gff #a gff3 file with the annotations you wish to update
These annotations will be given to MAKER as predictions allowing MAKER to modify the gene models to better match he evidence. most commonly MAKER will add three prime and five prime UTR.
model_gff=annotation_to_not_change #a gff3 file with annotation you do not want changed
These annotations will not be changed. This is where you would put manually curated genes to make sure they were not modified in the update.
keep_preds=1 #Add unsupported gene prediction to final annotation set, 1 = yes, 0 = no
This is to make sure that you don't lose the genes that are not supported by the new evidence. If you kept them in the annotation set in the first place you must have had a reason.
map_forward=1 #map names and attributes forward from old GFF3 genes
Keep the information associated with the genes so you can find them again.
Please post questions to the maker_dev list