Genome is the unique identity of every living organism and codes the genetic information. The field of genomics aims to diagnose the genome and discover knowledge from its structure and functions. This diagnosis is richer than the physical diagnosis of symptoms and targets the root cause of disease that is generally individualistic. Genomics uses bioinformatics which deals with collecting and analysing complex biological data like genome. This can be made efficient by using big data technologies. Eg: companies like Mayo Clinic have gathered and organized vast amounts of medical data relating to diseases and their symptoms,causes,etc in collaboration with IBM to create a knowledge base for extracting unique insights about diseases by leveraging Big Data. Use cases of Big Data include tasks like variant calling(Process of identifying variants from sequence data), variant prioritization (Identifying disease causing variants out of various possible candidates) and peptide discovery for cancer immunotherapy(Identifying antibodies that bind to the proteins synthesized by cancer cells). It can also be used for disease prevention eg: cardiovascular diseases. When a person experiences a stroke, his dna can be sampled and the genomic variant information that caused the stroke can be identified and used to prevent the risk to other members of the family. Medgenome aims to use state of the art technologies on the genetic data of south asians whose genetic makeup is considered to be pure and develop deep insights into the causes of rare diseases.The sampling of genomes has become economically viable nowadays and the entire genome can now be sequenced at the cost of 1000$. This trend shift has caused a rapid increase in the amount of data collected but the corresponding compute infrastructure for storage, compute is becoming a bottleneck to greater knowledge discovery. A lot of work still needs to be done related to the pipeline of execution of big data processing of genomic data such as variant calling. Use of FPGA based hardware accelerators like Dragen processors has reduced the amount of time to sequence the data from 40 hours to 8 hours. Variant calling is a relevant question because once our genome is sampled, the next concern naturally would be “How different is my genome compared to the normally observed genomes?” This knowledge can be used to predict potential disease risks for the particular individual.String matching needs to be solved efficiently. How can we identify a particular sequence of base pairs in a genome? Algorithms like Burrows-wheeler transform have been used successfully to compress the data and indexing the data to find substrings. Suffix trees have also been used for string matching. Bayesian algorithms are being used to identify mutations in genetic sequences. Variant prioritization is another interesting question. “How can we use machine learning to predict the particular variant out of like 3.5 million variants of genomes that is disease causing?” This requires developing clever heuristics based on domain knowledge to eliminate the unnecessary variants and reaching the optimum. Cancer immunotherapy(Immuno-Oncology) is becoming increasingly possible due to big data technology which can perform peptide processing on large scale and discover the right peptide to be injected for immunity form possible tumours.This advance in technology would help us cut down the cost of curing say lung cancer which is very costly (around 150k dollars). Genome based diagnosis can be done from pre-womb to tomb. Knowledge of genomic data of parents before a baby is conceived can be used to identify potential threats to the baby and help the couple make a decision related to birth planning. The ultimate goal of genomics in healthcare is to realize individualized medicine that is very effective and tries to cure the disease at the root level. Big data technology which is still in its infancy has a huge role to play in this revolution and needs to be made more efficient and scalable.