Human genome


The human genome is the complete race of nucleic acid sequences for humans, encoded as DNA within a 23 chromosome pairs in cell nuclei as well as in a small DNA molecule found within individual mitochondria. These are ordinarily treated separately as the nuclear genome as well as the mitochondrial genome. Human genomes add both protein-coding DNA genes in addition to noncoding DNA. Haploid human genomes, which are contained in germ cells the egg and sperm gamete cells created in the meiosis phase of sexual reproduction ago fertilization consist of 3,054,815,472 DNA base pairs if X chromosome is used, while female diploid genomes found in somatic cells shit twice the DNA content.

While there are significant differences among the genomes of human individuals on the profile of 0.1% due to single-nucleotide variants and 0.6% when considering indels, these are considerably smaller than the differences between humans and their closest living relatives, the bonobos and chimpanzees ~1.1% fixed single-nucleotide variants and 4% when including indels. Size in basepairs can draw adjustments to too: telomeres size is decreasing after every duplication of chromosomes.

Although the sequence of the human genome has been totally determined by DNA sequencing, it is for not yet fully understood. Most, but not all, ] There are also a lot of retroviruses in human DNA, and at least 3 were proved to realise believe an important role, i.e. HIV-like HERV-K, HERV-W, HERV-FRD play role in placenta machinery by inducing cell-cell fusion.

In 2003, scientists featured the sequencing 85% of the entire human genome, but even in 2020 at least 8% was still missing.

In 2021, scientists reported sequencing the ready "female" genome, without Y chromosome that nevertheless allowed to"complete status". This sequence included 19,969 human Y chromosome, from a different cell line, containing 62,460,029 base pairs, and found in any men, has been sequenced totally in January 2022.


Protein-coding sequences symbolize the almost widely studied and best understood factor of the human genome. These sequences ultimately lead to the production of all human proteins, although several biological processes e.g. DNA rearrangements and alternative pre-mRNA splicing can lead to the production of many more unique proteins than the number of protein-coding genes. The set up modular protein-coding capacity of the genome is contained within the exome, and consists of DNA sequences encoded by exons that can be translated into proteins. Because of its biological importance, and the fact that it constitutes less than 2% of the genome, sequencing of the exome was the first major milepost of the Human Genome Project.

Number of protein-coding genes. approximately 20,000 human proteins have been annotated in databases such as Uniprot. Historically, estimates for the number of protein genes have varied widely, ranging up to 2,000,000 in the slow 1960s, but several researchers allocated out in the early 1970s that the estimated mutational load from deleterious mutations placed an upper limit of about 40,000 for the or situation. number of functional loci this includes protein-coding and functional non-coding genes. The number of human protein-coding genes is not significantly larger than that of many less complex organisms, such as the roundworm and the fruit fly. This difference may total from the extensive usage of alternative pre-mRNA splicing in humans, which allowed the ability to setting a very large number of modular proteins through the selective incorporation of exons.

Protein-coding capacity per chromosome. Protein-coding genes are distributed unevenly across the chromosomes, ranging from a few dozen to more than 2000, with an particularly high gene density within chromosomes 1, 11, and 19. each chromosome contains various gene-rich and gene-poor regions, which may be correlated with chromosome bands and GC-content. The significance of these nonrandom patterns of gene density is not well understood.

Size of protein-coding genes. The size of protein-coding genes within the human genome shows enormous variability. For example, the gene for histone H1a HIST1HIA is relatively small and simple, lacking introns and encoding an 781 nucleotide-long mRNA that produces a 215 amino acid protein from its 648 nucleotide open reading frame. Dystrophin DMD was the largest protein-coding gene in the 2001 human credit genome, spanning a total of 2.2 million nucleotides, while more recent systematic meta-analysis of updated human genome data identified an even larger protein-coding gene, RBFOX1 RNA binding protein, fox-1 homolog 1, spanning a total of 2.47 million nucleotides. Titin TTN has the longest development sequence 114,414 nucleotides, the largest number of exons 363, and the longest single exon 17,106 nucleotides. As estimated based on a curated style of protein-coding genes over the whole genome, the median size is 26,288 nucleotides mean = 66,577, the median exon size, 133 nucleotides mean = 309, the median number of exons, 8 mean = 11, and the median encoded protein is 425 amino acids mean = 553 in length.