A Review: Big Data Technologies with Hadoop Distributed Filesystem and Implementing M/R
Today Big Data, is any set of data that is larger than the capacity to be processed using traditional database tools to capture, share, transfer, store, manage and analyze within an acceptable time frame; from the point of view of service providers, Organizations need to deal with a large amount of data for the purpose of analysis. And IT department are facing tremendous challenge in protecting and analyzing these increased volumes of information. The reason organizations are collecting and storing more data than ever before is because their business depends on it. The type of information being created is no more traditional database-driven data referred to as structured data rather it is data that include documents, images, audio, video, and social media contents known as unstructured data or Big Data. Big Data Analytics is a way of extracting value from these huge volumes of information, and it drives new market opportunities and maximizes customer retention. Moreover, this paper focuses on discussing and understanding Big Data technologies and Analytics system with Hadoop distributed filesystem (HDFS). This can help predict future, obtain information, take proactive actions and make way for better strategic decision making.
2. EMC Solutions Group. Big Data-as-a-Service. 2012, July. Retrieved from https://www.emc.com/collateral/software/white-papers/h10839-big-data-as-a-service-perspt.pdf
3. Dhawan, S & Rathee, S. Big Data Analytics using Hadoop Components like Pig and Hive. American International Journal of Research in Science, Technology, Engineering & Mathematics, 88. 2013 Retrieved from http://iasir.net/AIJRSTEMpapers/AIJRSTEM13-131.pdf
4. Enterprise Hadoop: The Ecosystem of Projects. Retrieved from http://hortonworks.com/hadoop/
5. Penchikala, S. Big Data Processing with Apache Spark - Part 1: Introduction. 2015, January Retrieved from http://www.infoq.com/articles/apache-spark-introduction
6. Grunsky, E. C. "R: a data analysis and statistical programming environment–an emerging tool for the geosciences." Computers & Geosciences. 28.10.2002.
7. Fang, Huang. "Managing data lakes in big data era: What's a data lake and why has it became popular in data management ecosystem." Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2015 IEEE International Conference on. IEEE, 2015.
8. Tiwari, S. Using Oracle Berkeley DB as a NoSQL Data Store. 2011. Retrieved April 5 2015 from
9. Waller, Matthew A., and Stanley E. Fawcett. "Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management." Journal of Business Logistics 34.2, pp.77-84. (2013).
10. O'Leary, Daniel E. "Artificial intelligence and big data." IEEE Intelligent Systems pp.96-99. 28.2 (2013).
11. MICHAEL, JW, ALAN COHN, and JARED R. BUTCHER. "BlockChain technology." The Journal (2018).
12. Deka, Ganesh Chandra. "Big data predictive and prescriptive analytics." Handbook of Research on Cloud Infrastructures for Big Data Analytics. IGI Global, Pp.370-391. 2014.
13. Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. Ieee, 2010.
14. Vavilapalli, Vinod Kumar, et al. "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
15. IBM. 2015. IBM - What is MapReduce. from: https://www.01.ibm.com/software/data/infosphere/hadoop/mapreduce/.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-ND 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
AJNU is committed to protecting the privacy of the users of this journal website. The names, personal particulars and e-mail addresses entered in this website will be used only for the stated purposes of this journal and will not be made available to third parties without the user's permission or due process. Users consent to receive communication from the AJNU for the stated purposes of the journal. Queries with regard to privacy may be directed to firstname.lastname@example.org.