Expert Hadoop Administration

Managing, Tuning, and Securing Spark, YARN, and HDFS

DOWNLOAD NOW »

Author: Sam R. Alapati

Publisher: Addison-Wesley Professional

ISBN: 9780134597195

Category: Computers

Page: 848

View: 2236

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size. Paul Dix, Series Editor In Expert Hadoop(r) Administration, leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop s architecture from an administrator s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop Normal 0 false false false EN-US X-NONE X-NONE "

Hadoop 2.x Administration Cookbook

DOWNLOAD NOW »

Author: Gurmukh Singh

Publisher: Packt Publishing Ltd

ISBN: 1787126870

Category: Computers

Page: 348

View: 3380

Over 100 practical recipes to help you become an expert Hadoop administrator About This Book Become an expert Hadoop administrator and perform tasks to optimize your Hadoop Cluster Import and export data into Hive and use Oozie to manage workflow. Practical recipes will help you plan and secure your Hadoop cluster, and make it highly available Who This Book Is For If you are a system administrator with a basic understanding of Hadoop and you want to get into Hadoop administration, this book is for you. It's also ideal if you are a Hadoop administrator who wants a quick reference guide to all the Hadoop administration-related tasks and solutions to commonly occurring problems What You Will Learn Set up the Hadoop architecture to run a Hadoop cluster smoothly Maintain a Hadoop cluster on HDFS, YARN, and MapReduce Understand high availability with Zookeeper and Journal Node Configure Flume for data ingestion and Oozie to run various workflows Tune the Hadoop cluster for optimal performance Schedule jobs on a Hadoop cluster using the Fair and Capacity scheduler Secure your cluster and troubleshoot it for various common pain points In Detail Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems encountered in Hadoop administration. The book begins with laying the foundation by showing you the steps needed to set up a Hadoop cluster and its various nodes. You will get a better understanding of how to maintain Hadoop cluster, especially on the HDFS layer and using YARN and MapReduce. Further on, you will explore durability and high availability of a Hadoop cluster. You'll get a better understanding of the schedulers in Hadoop and how to configure and use them for your tasks. You will also get hands-on experience with the backup and recovery options and the performance tuning aspects of Hadoop. Finally, you will get a better understanding of troubleshooting, diagnostics, and best practices in Hadoop administration. By the end of this book, you will have a proper understanding of working with Hadoop clusters and will also be able to secure, encrypt it, and configure auditing for your Hadoop clusters. Style and approach This book contains short recipes that will help you run a Hadoop cluster efficiently. The recipes are solutions to real-life problems that administrators encounter while working with a Hadoop cluster

Smart Intelligent Computing and Applications

Proceedings of the Second International Conference on SCI 2018

DOWNLOAD NOW »

Author: Suresh Chandra Satapathy,Vikrant Bhateja,Swagatam Das

Publisher: Springer

ISBN: 9811319278

Category: Technology & Engineering

Page: 689

View: 404

The proceedings covers advanced and multi-disciplinary research on design of smart computing and informatics. The theme of the book broadly focuses on various innovation paradigms in system knowledge, intelligence and sustainability that may be applied to provide realistic solution to varied problems in society, environment and industries. The volume publishes quality work pertaining to the scope of the conference which is extended towards deployment of emerging computational and knowledge transfer approaches, optimizing solutions in varied disciplines of science, technology and healthcare.

Advanced Intelligent Systems for Sustainable Development (AI2SD’2018)

Volume 5: Advanced Intelligent Systems for Computing Sciences

DOWNLOAD NOW »

Author: Mostafa Ezziyyani

Publisher: Springer

ISBN: 3030119289

Category: Computers

Page: 1005

View: 9921

This book includes the outcomes of the International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD-2018), held in Tangier, Morocco on July 12–14, 2018. Presenting the latest research in the field of computing sciences and information technology, it discusses new challenges and provides valuable insights into the field, the goal being to stimulate debate, and to promote closer interaction and interdisciplinary collaboration between researchers and practitioners. Though chiefly intended for researchers and practitioners in advanced information technology management and networking, the book will also be of interest to those engaged in emerging fields such as data science and analytics, big data, internet of things, smart networked systems, artificial intelligence, expert systems and cloud computing.

Big Scientific Data Benchmarks, Architecture, and Systems

First Workshop, SDBA 2018, Beijing, China, June 12, 2018, Revised Selected Papers

DOWNLOAD NOW »

Author: Rui Ren,Chen Zheng,Jianfeng Zhan

Publisher: Springer

ISBN: 9811359105

Category: Computers

Page: 123

View: 1780

This book constitutes the refereed proceedings of the First Workshop on Big Scientific Data Benchmarks, Architecture, and Systems, SDBA 2018, held in Beijing, China, in June 2018. The 10 revised full papers presented were carefully reviewed and selected from 22 submissions. The papers are organized in topical sections on benchmarking; performance optimization; algorithms; big science data framework.

AWS Certified SysOps Administrator Associate All-in-One-Exam Guide (Exam SOA-C01)

DOWNLOAD NOW »

Author: Sam R. Alapati

Publisher: McGraw Hill Professional

ISBN: 1260135551

Category: Computers

Page: N.A

View: 1264

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. This study guide covers 100% of the objectives for the AWS Certified SysOps Administrator Associate exam Take the challenging AWS Certified SysOps Administrator Associate exam with confidence using this highly effective self-study guide. You will learn how to provision systems, ensure data integrity, handle security, and monitor and tune Amazon Web Services performance. Written by an industry-leading expert, AWS Certified SysOps Administrator Associate All-in-One Exam Guide (Exam SOA-C01) fully covers every objective for the exam and follows a hands-on, step-by-step methodology. Beyond fully preparing you for the exam, the book also serves as a valuable on-the-job reference. Covers all exam topics, including:•Systems operations•Signing up, working with the AWS Management Console, and the AWS CLI•AWS Identity and Access Management (IAM) and AWS service security•AWS compute services and the Elastic Compute Cloud (EC2)•Amazon ECS, AWS Batch, AWS Lambda, and other compute services•Storage and archiving in the AWS cloud with Amazon EBS, Amazon EFS, and Amazon S3 Glacier•Managing databases in the cloud—Amazon RDS, Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache, and Amazon Redshift•Application integration with Amazon SQS and Amazon SNS•AWS high availability strategies•Monitoring with Amazon CloudWatch, logging, and managing events•Managing AWS costs and billing•Infrastructure provisioning through AWS CloudFormation and AWS OpsWorks, application deployment, and creating scalable infrastructures Online content includes:•130 practice questions•Test engine that provides full-length practice exams or customized quizzes by chapter or by exam domain

Expert Apache Cassandra Administration

DOWNLOAD NOW »

Author: Sam R. Alapati

Publisher: Apress

ISBN: 1484231260

Category: Computers

Page: 467

View: 651

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases. Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simple to complex Cassandra databases. The book: Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency What You'll Learn Install the Cassandra software and create your first database Understand the Cassandra data model, and the internal architecture of a Cassandra database Create your own Cassandra cluster, step-by-step Run a Cassandra cluster on Docker Work with Apache Spark by connecting to a Cassandra database Deploy Cassandra clusters in your data center, or on Amazon EC2 instances Back up and restore mission-critical Cassandra databases Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage Who This Book Is For Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

Mastering Apache Solr 7.x

An expert guide to advancing, optimizing, and scaling your enterprise search

DOWNLOAD NOW »

Author: Sandeep Nair,Chintan Mehta,Dharmesh Vasoya

Publisher: Packt Publishing Ltd

ISBN: 1788831551

Category: Computers

Page: 308

View: 2603

Accelerate your enterprise search engine and bring relevancy in your search analytics Key Features A practical guide in building expertise with Indexing, Faceting, Clustering and Pagination Master the management and administration of Enterprise Search Applications and services seamlessly Handle multiple data inputs such as JSON, xml, pdf, doc, xls,ppt, csv and much more. Book Description Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites. To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs. By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands. What you will learn Design schema using schema API to access data in the database Advance querying and fine-tuning techniques for better performance Get to grips with indexing using Client API Set up a fault tolerant and highly available server with newer distributed capabilities, SolrCloud Explore Apache Tika to upload data with Solr Cell Understand different data operations that can be done while indexing Master advanced querying through Velocity Search UI, faceting and Query Re-ranking, pagination and spatial search Learn to use JavaScript, Python, SolrJ and Ruby for interacting with Solr Who this book is for The book would rightly appeal to developers, software engineers, data engineers and database architects who are building or seeking to build enterprise-wide effective search engines for business intelligence. Prior experience of Apache Solr or Java programming is must to take the best of this book.

Mastering MongoDB 3.x

An expert's guide to building fault-tolerant MongoDB applications

DOWNLOAD NOW »

Author: Alex Giamas

Publisher: Packt Publishing Ltd

ISBN: 1783982616

Category: Computers

Page: 342

View: 1458

An expert's guide to build fault tolerant MongoDB application About This Book Master the advanced modeling, querying, and administration techniques in MongoDB and become a MongoDB expert Covers the latest updates and Big Data features frequently used by professional MongoDB developers and administrators If your goal is to become a certified MongoDB professional, this book is your perfect companion Who This Book Is For Mastering MongoDB is a book for database developers, architects, and administrators who want to learn how to use MongoDB more effectively and productively. If you have experience in, and are interested in working with, NoSQL databases to build apps and websites, then this book is for you. What You Will Learn Get hands-on with advanced querying techniques such as indexing, expressions, arrays, and more. Configure, monitor, and maintain highly scalable MongoDB environment like an expert. Master replication and data sharding to optimize read/write performance. Design secure and robust applications based on MongoDB. Administer MongoDB-based applications on-premise or in the cloud Scale MongoDB to achieve your design goals Integrate MongoDB with big data sources to process huge amounts of data In Detail MongoDB has grown to become the de facto NoSQL database with millions of users—from small startups to Fortune 500 companies. Addressing the limitations of SQL schema-based databases, MongoDB pioneered a shift of focus for DevOps and offered sharding and replication maintainable by DevOps teams. The book is based on MongoDB 3.x and covers topics ranging from database querying using the shell, built in drivers, and popular ODM mappers to more advanced topics such as sharding, high availability, and integration with big data sources. You will get an overview of MongoDB and how to play to its strengths, with relevant use cases. After that, you will learn how to query MongoDB effectively and make use of indexes as much as possible. The next part deals with the administration of MongoDB installations on-premise or in the cloud. We deal with database internals in the next section, explaining storage systems and how they can affect performance. The last section of this book deals with replication and MongoDB scaling, along with integration with heterogeneous data sources. By the end this book, you will be equipped with all the required industry skills and knowledge to become a certified MongoDB developer and administrator. Style and approach This book takes a practical, step-by-step approach to explain the concepts of MongoDB. Practical use-cases involving real-world examples are used throughout the book to clearly explain theoretical concepts.