Programming Pig

Dataflow Scripting with Hadoop

DOWNLOAD NOW »

Author: Alan Gates,Daniel Dai

Publisher: "O'Reilly Media, Inc."

ISBN: 1491937068

Category: Computers

Page: 368

View: 2483

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The next step? Processing and analyzing datasets with the Apache Pig scripting platform. With Pig, you can batch-process data without having to create a full-fledged application, making it easy to experiment with new datasets. Updated with use cases and programming examples, this second edition is the ideal learning tool for new and experienced users alike. You’ll find comprehensive coverage on key features such as the Pig Latin scripting language and the Grunt shell. When you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Use Pig with Apache Tez to build high-performance batch and interactive data processing applications Create your own load and store functions to handle data formats and storage mechanisms

The Stances of e-Government

Policies, Processes and Technologies

DOWNLOAD NOW »

Author: Puneet Kumar,Vinod Kumar Jain,Kumar Sambhav Pareek

Publisher: CRC Press

ISBN: 135139617X

Category: Computers

Page: 206

View: 4489

This book focuses on the three inevitable facets of e-government, namely policies, processes and technologies. The policies discusses the genesis and revitalization of government policies; processes talks about ongoing e-government practices across developing countries; technology reveals the inclusion of novel technologies.

Hadoop: The Definitive Guide

DOWNLOAD NOW »

Author: Tom White

Publisher: "O'Reilly Media, Inc."

ISBN: 1449396895

Category: Computers

Page: 628

View: 1989

Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters. This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book. Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase, Hadoop’s database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera

Planning for Big Data

DOWNLOAD NOW »

Author: Edd Wilder-James

Publisher: "O'Reilly Media, Inc."

ISBN: 1449329640

Category: Computers

Page: 83

View: 3205

In an age where everything is measurable, understanding big data is an essential. From creating new data-driven products through to increasing operational efficiency, big data has the potential to make your organization both more competitive and more innovative. As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven. Written by O'Reilly Radar's experts on big data, this anthology describes: The broad industry changes heralded by the big data era What big data is, what it means to your business, and how to start solving data problems The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions The landscape of NoSQL databases and their relative merits How visualization plays an important part in data work

Microsoft Big Data Solutions

DOWNLOAD NOW »

Author: Adam Jorgensen,James Rowland-Jones,John Welch,Dan Clark,Christopher Price,Brian Mitchell

Publisher: John Wiley & Sons

ISBN: 1118729552

Category: Computers

Page: 408

View: 8040

Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop. Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools Explores both on-premises and cloud-based solutions Shows how to store, manage, analyze, and share Big Data through the enterprise Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more Helps you build and execute a Big Data plan Includes contributions from the Microsoft and HortonWorks Big Data product teams If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.

Learning Hadoop 2

DOWNLOAD NOW »

Author: Garry Turkington,Gabriele Modena

Publisher: Packt Publishing Ltd

ISBN: 1783285524

Category: Computers

Page: 382

View: 5789

If you are a system or application developer interested in learning how to solve practical problems using the Hadoop framework, then this book is ideal for you. You are expected to be familiar with the Unix/Linux command-line interface and have some experience with the Java programming language. Familiarity with Hadoop would be a plus.

HDInsight Essentials - Second Edition

DOWNLOAD NOW »

Author: Rajesh Nadipalli

Publisher: Packt Publishing Ltd

ISBN: 1784396664

Category: Computers

Page: 178

View: 7955

If you want to discover one of the latest tools designed to produce stunning Big Data insights, this book features everything you need to get to grips with your data. Whether you are a data architect, developer, or a business strategist, HDInsight adds value in everything from development, administration, and reporting.

Big Data for Chimps

A Guide to Massive-Scale Data Processing in Practice

DOWNLOAD NOW »

Author: Philip (flip) Kromer,Russell Jurney

Publisher: "O'Reilly Media, Inc."

ISBN: 149192392X

Category: Computers

Page: 220

View: 9590

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems. Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data. Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster Dive into map/reduce mechanics and build your first map/reduce job in Python Understand how to run chains of map/reduce jobs in the form of Pig scripts Use a real-world dataset—baseball performance statistics—throughout the book Work with examples of several analytic patterns, and learn when and where you might use them