Paper Title
In Silico Analysis Reveals Multiple Genes of Mapk, Mapkk and Mapkkk Gene Families of The Mapk Cascade in Nicotiana Tabacum

Abstract
Tobacco is an important cash crop and model plant studied in detail for mechanisms as fundamental as growth and development. Mitogen-activated protein kinase (MAPK) cascade constitutes gene families with multiple roles during plant life in addition to different stress conditions. This study is based on in silico identification, sequence analysis and phylogenetic analysis of MAPKs retrieved from tobacco genome. A total of 162 MAPK genes, including 30 MAPK, 10 MAPK kinase, and 122 MAPK kinase kinase, were identified from tobacco genome based on similarity and signature motifs. Prediction of the presence of other domains, motifs and physical properties was performed on the identified MAPKs followed by multiple sequence alignment-based phylogenetic tree generation. Additional domains were found alongside protein kinase domain in some of the tobacco MAPKs orthologous to those in Arabidopsis with the same additional domains.In the identified sequences, variations were observed in the properties – these include gene and protein lengths, molecular weights, isoelectric points and subcellular localizations. Phylogenetic analysis confirmed clustering of tobacco MAPKs according to the Arabidopsis MAPK nomenclature.Bootstrap replications were used to determine the reliability of these results. However, quantitative phylogenetic analysisshowed minute similarities, that too in rare cases, among the mRNA sequences of the genes in the subject families. Keywords- NtMAPK, mitogen-activated protein kinase, MKK, MEKK, Raf, ZIK, Phylogenetic tree, heatmap, Plant Genome. I. INTRODUCTION Tobacco is an important cash crop of the Solanaceae family, an important plant family containing other crops such as tomato, potato, eggplant and pepper. Nicotiana tabacum bears tetraploidy which resulted in larger genome size of tobacco in comparison to other species in this family [1-3]. With a huge genome size of approximately 4.5Gb constituting round about three quarter repeats, tobacco is the result of Nicotiana sylvestris / Nicotiana tomentosiformis hybridization [4-6]. Duplication of sequences in the tobacco genome are thus a result of this tetraploidization event [7]. Shuffling of DNA via transpositions and reduction in size via deletions have seen to frequently occur in allopolyploids [8-10]. Various polyploidization events in the genus Nicotiana at various times led to the formation of allotetraploid species which constitute approximately half of this genus which is considered to have properties ideal for model studies on polyploidization [2, 4, 11]. Tobacco is considered to be a model plant also for alloploidy studies due to its formation from the ancestral species very recently about <0.2 million years ago [4, 12]. Allopolyploids that are young happen to undergo changes in their genomic compositions widely, with taxa falling in groups with variations in the extent of changes occurring in them, starting right after the hybridization event [13-16]. Looking into the features that count for the importance of tobacco, the characterization of genes from the tobacco genome seem to be essential. Identification of the tobacco genes appear to be a crucially fundamental step in the process. This study is based on the identification of mitogen-activated protein kinases, from the sequenced genome of commercial tobacco, Nicotiana tabacum, which in Arabidopsis are known to function in all phases in the life of the plant [17, 18]. Also known to play central roles while the plant is in the phase of coping stress [19-21], whether caused by harsh environmental conditions or by microbes and insects that result in phenotypic changes, MAPKs are a center of attention for the better understanding of plant mechanisms. MAPK is a large cascade based on families including Mitogen-activated protein kinases (MPK), Mitogenactivated protein kinase kinases (MKK), and Mitogen-activated