Review by big data, which type of benefit that

Review Paper-Brief knowledge about Big Data Abstract-In the digital era, the amount of data generated and stored has expanded within a less period of time. Accordingly, the fast increasing rate of data has formed many challenges.  Size is the primary, and from time to time, the only dimension that leap out at the state of big data. This paper attempts to offer a broader definition of big data that captures its other exclusive and important characteristics. 2 we have discussed about 4V’s model. It contains about which type of technology uses by big data, like Hadoop, HDFS, MapReduce and also which type of Methods used by big data, which type of benefit that is uses in big data analysis and include some feature of cloud computing.Keywords- Big Data, Big Data Analytics, IOT, Hadoop, HDFS, Cloud Computing,1. Introduction-  This paper consist the basic concepts about big data. Over the last 2 decades data has increased in a huge amount in various fields. Since the creation of computers, large amounts of data have been generated at a rapid rate. Advance in mobile devices, digital sensors, communications, computing, and storage have provided means to gather data. According to the popular IT company Industrial Development Corporation (IDC; 2011), the total amounts of data in the world has increased nine times within five years. This number is projected to twice at least each two years. Nowadays, big data connected to the service of Internet companies raise fastly. Some of the examples are like Google process data of hundreds of Petabyte (PB), Facebook generate log data of tens of PB per month, Baidu, is a Chinese company that processes data of 10 PB, and Taobao, is a subsidiary of Alibaba also generates data of 10 Terabyte (TB) online trading for every day.3 While the amount of huge datasets is drastically rising, it also brings about many challenging problems demanding prompt solutions:– The newest advances of IT (Information Technology) make it more easily to generate data. For example, on an average, 72 hours of videos are uploaded on YouTube per minute 6. Therefore, we are confronting with the major challenges of collecting and integrating immense data from widely distributed data sources.– The speedy growth of cloud computing and the Internet of Things (IoT) endorse the rapid growth of data. Cloud computing provides protection, access sites and channels for data asset. In the example of IoT, sensors are collecting and transmitting data in all over the world and it is stored and processed in the cloud. Such data in both amount and mutual relations will far surpass the capacities of the IT architectures and infrastructure of existing enterprises and its real time requirement will also very much stress the available computing capacity 4.The gradually more rising data cause a problem of how to store and manage huge datasets with moderate requirements on software and hardware infrastructure.– In consideration of the scalability, heterogeneity, complexity, real-time, and privacy of big data, we shall effectively “mine” the datasets at different levels during the analysis, forecasting, visualization, and modeling, so as to disclose its essential property and improve the decision making. 3, 20, 21, 222. Sources of Big DataBig data is a mixture of different kind of granular data. The applications that are the main sources of producing voluminous amounts of data, namely Internet of Things (IoT), self-quantified, multimedia, and social media data 17. Some of them are listed below:• Google• Google+• Yahoo• Amazon• facebook• Youtube• LinkedIn• Twitter• Apple• Instagram• Wordpress2.1 Characteristic of Big Data Fig. 1: 4 V’S model of Big DataVolume: It is the most important characteristic of big data. It represents the size of the big data set.Variety: These data do not have a fixed structure and rarely present themselves in a perfectly ordered form and ready for processing 5. Indeed, such data can be highly structured (data from relational databases), semi-structured (web logs, social media feeds, raw feed directly from a sensor source, email, etc.) or unstructured (video, still images, audio, clicks) 4, 5.Velocity: It involves streams of data, structured records creation, and availability for access and delivery. Indeed it is not just the velocity of the incoming data that is the issue: it is possible to stream fast-moving data into bulk storage for later batch processing, for example. The importance lies in the speed of the feedback loop, taking data from input through to decision 4,5.Veracity: It is the accuracy of the data. The data should be acquired from correct resources and its security should be provided. Only authorized people should have the access permission 6.3. Big Data Technologies There are some of tools which can be used in Big Data management from data acquisition to data analysis, as illustrated in Fig. 2. Fig. 2: Big Data Technology3.1 HADOOP Hadoop is an open-source framework of application for Big Data. While Hadoop can be installed on any system; often it is used as cloud service. Some of the providers for Hadoop services are Amazon, Cloudera, EMC, Hadapt, Hortonworks, IBM, Informatica, Karmasphere, MapR, Microsoft, and Oracle. 10, 123.2 HDFS HDFS is a Hadoop distributed file system designed to run on large clusters of commodity hardware based on Google File System (GFS) 7, 8,.HDFS is distributed, scalable and portable. It is used to store large files (can be in gigabytes and terabytes) across many servers. Thus, Hadoop can have hundreds or even millions of separate files that are spread across many computers (can be in thousands) and all are connected through the software to each other. HDFS is dedicated to batch processing rather than interactive use by users 5, 8. In HDFS applications, files are written once and accessed many times 8, 9; consequently data coherency is ensured and data are accessed in high throughput 8. With HDFS file system metadata are stored in a dedicated server, the Name Node, and the application data in other servers called Data Nodes. Apart from for processing huge datasets, HDFS has many additional goals whose major is to detect and handle failures at the application layer. This objective is realized through a well-organized mechanism of replication where files are divided into blocks. Each block is replicated on a number of data nodes; all the data nodes containing a replica of a block are not located in the same rack 4, 12.3.3 MapReduce MapRedue is a process for mapping and reducing. Mapping splits a task and its related data into many pieces so that they can be sent to several different servers for being processed in parallel. The reducing process takes the results from the different computers and combines them to give a single result. YARN, which stands for yet another Resource Negotiator, is a replacement for MapReduce. It provides additional more robust functionality. PIG is a platform that allows one to write MapReduce Programs. Like any programming language, PIG’s language is known as Pig Latin Programming Language. Hive is another module, which allows summarizing queries and data analysis. It also has its own language known as HiveQL (a SQL like language) for queries. 10, 12, 194. Areas that are Usage of Big DataBig data is used professionally in several fields 1. Some of them are listed below:• Education and Research area,• Oil and Gas,• Automotive Industry,• Telecommunication Sector,• High technology and industry,• Retail Industry,• Medical field,• Packaged consumer products,• Travel and transport sector,• Media and show business,• Social media and online services• Public services,• Health services,• Financial services,• Law enforcement and defense industry.5. Methods that are used in Big Data5.1 Text Analytics Text analytics is used for information retrieval from data. E-mails, online forums, blogs, news and call center records are examples of text data. Text analytics involve statistical analysis, machine learning and computational linguistics. Text analytics allow extracting meaningful summary from large.Some of the several Techniques that are used in text analytics like Text Summarization, Information Extraction, Question answer and Sentiment Analysis 15.2 Audio Analytics Audio analytics is used for remove information from unstructured audio data. Call centers are using utilization area of audio analytics. Audio analytics can be used in many fields like that raising the customer experience, the performance of customer representative and the sales rate; comprehending several tasks like customer behaviors and the trouble of products 1.5.3 Video Analytics Video analytics is used for various techniques that are to extract meaningful information, track and analyze video streams. Marketing and operations management is main application area of video analytics 1.5.4 Social Media Analytics Social media analytics is used for analysis of structured and unstructured data on the social media channel 1. Some of the Social media can be categorized as follows:• Social networks (e.g. Facebook and LinkedIn)• Digital marketing (e.g. Amazon, Flipkart)• Blogs (e.g. Blogger and WordPress)• Microblogs (e.g.Twitter and Tumblr)• Social news (e.g. Digg and Reddit)• Social bookmarking (e.g. Delicious and Stumble Upon)• Media sharing(e.g. Instagram and YouTube)• Wikis (e.g. Wikipedia and Wikihow)• Question-and-answer sites (e.g., Yahoo! Answers and• Review sites (e.g. Yelp, TripAdvisor)5.5 Mobile Data Analysis By April 2013, Android Apps has provided more than 650,000 applications, covering nearly all categories. By the end of 2012, the monthly mobile data flow has reached 885 PB 14. The massive data and abundant applications call for mobile analysis, but also bring about a few challenges. As a whole, mobile data has unique characteristics, e.g., mobile sensing, moving flexibility, noise, and a large amount of redundancy. Recently, new research on mobile analysis has been started in different fields. While the study on mobile analysis is now started, we will only introduce some new and representative analysis applications in this section 3. With the increasing numbers of mobile users and improved performance, mobile phones are now helpful for building and maintaining communities, like communities with geographical locations and communities based on different cultural backgrounds and interests (e.g. the latest Web chat). Traditional network communities or SNS communities are in short of online interaction among members, and the communities are active only when members are sitting before computers. On the opposing, mobile phones can support wealthy interaction at anytime and anywhere. It is defined as a group of persons with the same hobbies (i.e. health, safety, and entertainment, etc.) gather together on networks, meet to make a common goal, decide measures through consultation to achieve the goal, and start to implement their plan 15. In 16, the authors proposed a qualitative model of a mobile community. It is now broadly believed that mobile community applications will deeply promote the development of the mobile industry. 3, 216. Internet of Thing (IoT) The Internet of Thing (IoT) is an important source of big data. It provides a large number of devices connected to the network, enabling “anytime, anywhere” access to information. It implies that these devices can be managed from the web and in turn, provide information in real time, allowing the interaction with people who use it. As a phenomenon which has more profound impacts on our society than most others, the IoT can be viewed in all industries, security, agriculture, traffic, transportation, education, medical care, healthcare, public departments, and families, and many other domains.11While the development of data transmission in IoT, the network architecture can be divided into three layers: sensing layer, network layer, and application layer. The sensing layer is liable for data acquisition and mostly consists of sensor networks. The network layer is responsible for information transmission and processing, where close transmission can rely on sensor networks, and remote transmission shall depend on the Internet. Finally, the application layer support specific applications of IoT. 36.1 Application of IoT based Big DataThe Internet of Things (IoT) is not only an important source of big data, but also one of the main markets of big data applications. Because the high variety of objects, the applications of IoT also evolve endlessly. Logistic enterprises may have profoundly experienced with the application of IoT big data. For example, trucks of UPS are equipped with sensors, wireless adapters, and GPS, so the Headquarter can track truck positions and prevent engine failures. Meanwhile, this system also helps UPS to supervise and manage its employees and optimize delivery routes. The optimal delivery routes specified for UPS trucks are derived from their past driving experience. In 2011, UPS drivers have driven for nearly 48.28 million km less. Smart city is a hot research area based on the application of IoT data. For example, the smart city project cooperation between the Miami-Dade County in Florida and IBM closely connects 35 types of key county government departments and Miami city and helps government leaders obtain better information support in decision making for managing water resources, reducing traffic jam, and improving public safety. The application of smart city brings about benefits in many aspects for Dade County. For an example, the Department of Park Management of Dade County saved 1 million USD in water bills due to timely identifying and fixing water pipes that were running and leaking this year. 36.2 Relationship between IoT and Big Data In the example of IoT, an enormous amount of networking sensors is embedded into various devices and machines in the real world. Such sensors deployed in different fields may collect various kinds of data, like environmental data, geographical data, astronomical data, and logistic data. Mobile equipments, transportation facilities, public facilities, and home appliances could all be data acquisition equipments in IoT, as illustrated in Fig. 3. 3When big data generate by IoT (Internet of Things), it has dissimilar characteristics compared with common big data because it has different types of data composed, which is the most standard characteristics consist of heterogeneity, variety, unstructured feature, noise, and high redundancy. Even if the current IoT data is not the most important part of big data, by 2030, the quantity of sensors will reach 1 trillion and then the IoT data will be the most essential part of big data, according to the estimate of HP. A report of Intel says that big data in IoT has 3 features that confirm to the big data paradigm: (i) Abundant terminals generating masses of data.(ii) Data generated by IoT is usually semi-structured or unstructured.(iii) The Data of IoT is helpful only when it is analyzed.Nowadays, data processing capacity of IoT has fallen behind the collected data and it is extremely urgent to accelerate the introduction of big data technology to promote the development of IoT. Many IoT operators understand the importance of big data as the success of IoT is hinged upon the effective addition of big data and cloud computing. The extensive use of IoT will also bring many cities into the big data era 21, 22.It has compelling need to accept big data for IoT applications, while the development of big data is already legged behind. IoT has broadly known as these two technologies are inter-dependent and should be jointly developed: on one hand, the extensive deployment of IoT drives the high growth of data both in quantity and category, so providing the opportunity for the application and development of big data; on the other hand, the application of big data technology to IoT also accelerate the research advances and business models of IoT. 36.3 Relationship between Hadoop and Big DataNowadays, Hadoop is broadly used in big data applications industry, e.g. spam filtering, network searching, click stream analysis, and social recommendation. Now Hadoop is based on considerable academic research. Some representative cases are given below. In June 2012, Yahoo confirmed it runs Hadoop in 42,000 servers at 4 data centers to support its products and services, like searching and spam filtering, etc. Now the largest Hadoop cluster has 4,000 nodes, but the number of nodes will be increased to 10,000 with the release of Hadoop 2.0. In the same month, Facebook announced that their Hadoop cluster can process 100 PB (Petabyte) data, which grew by 0.5 PB per day as in November 2012. Some well-known agency that uses Hadoop to perform distributed computation is listed in 13. In addition, many companies provide Hadoop commercial execution and/or support, including Oracle, Cloudera, MapR, IBM, EMC, and. 36.4 Relationship between Cloud Computing and Big Data Cloud computing is closely related to big data. The object of the big data computation-intensive operation and stresses the storage capacity of a cloud system. The key goal of cloud computing is to relate huge computing and storage assets in determined management, so as to give big data applications with fine-grained computing capacity. The development of cloud computing provide solutions for the storage and processing of big data. It also provide, the emergence of big data accelerate the progress of cloud computing. The cloud computing can effectively manage big data because it is based on distributed storage technology; the parallel computing capacity by virtue of cloud computing can improve the efficiency of acquisition and analyzing big data.7. Why we Choose Big Data Analytics Big data Analysis are used to make Fast & better decision, Cost reduction. It also helps to analyze new product and services, as illustrated in Fig. 4 18.   8. Challenges of Big Data One of the very basic challenges is to understand and prioritize the data from the garbage that is coming into the enterprise. Ninety percent of all the data is noise, and it is a daunting task to classify and filter the knowledge from the noise. In the search for inexpensive methods of analysis, organizations have to compromise and balance against the confidentiality requirements of the data. The use of cloud computing and virtualization further complicates the decision to host big data solutions outside the enterprise. But using those technologies is a trade-off against the cost of ownership that every organization has to deal with. The data is piling up so quickly that it is becoming costlier to records it. The organizations struggle to decide how long this data has to be retained, as some data is useful for making long-term decisions, while other data is not relevant even a few hours after it has been generated. The arrival of new technologies and tools required to build big data solutions, availability of skills is a big challenge. A higher level of proficiency in the data sciences required to implement big data solutions today because the tools are not user-friendly yet. They still require computer science graduates to configure and operationalize a big data system.9. ConclusionWe are living in the era of data deluge. The term Big Data had been coined to describe this age. This paper defines and characterizes the concept of Big Data. It gives a definition of this new concept and its characteristics. In addition, a supply chain and technologies for Big Data management are presented 4, we have discussed about 4V’s Model, Volume, Variety, Velocity and Veracity. The volume is the most tackled aspect and many works leverage Hadoop MapReduce to deal with volume more and more, unlike velocity, web and social media informality and uncertainty are addressed by scientists. We already discussed about areas used by big data, also methods used by big data like text, audio, video, social media, and mobile data analytics. Some feature about IoT, relationship between IoT and big data, Hadoop and big data, cloud computing and big data.Several difficulties may show up in the acquisition, storage and processing of data. As the interest in big data increases, such difficulties will decrease or will be solved in shorter time. Data providers bear tremendous responsibility as much as the researchers in big data. The Providers of Data which cannot process data will get harmed in terms of competition as they hide their data. Some of the big firms like Google, facebook, Amazon, Baidu, YouTube etc. have already gone towards big data because they all have broad visions. Those firms each hold its own market, expand their dominance at the same time ensure customer satisfaction. They keep their leadership and increase their market values day by day.