Hadoop: The Definitive Guide
$47.49
Price: [price_with_discount]
(as of [price_update_date] – Details)
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.
Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:
Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduceBecome familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistenceDiscover common pitfalls and advanced features for writing real-world MapReduce programsDesign, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloudUse Pig, a high-level query language for large-scale data processingTake advantage of HBase, Hadoop’s database for structured and semi-structured dataLearn ZooKeeper, a toolkit of coordination primitives for building distributed systems
If you have lots of data — whether it’s gigabytes or petabytes — Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.
“Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk.”– Doug Cutting, Hadoop Founder, Yahoo!
ASIN : B0043D2ECC
Publisher : O’Reilly Media; 1st edition (May 29, 2009)
Publication date : May 29, 2009
Language : English
File size : 5920 KB
Simultaneous device usage : Unlimited
Text-to-Speech : Enabled
Screen Reader : Supported
Enhanced typesetting : Enabled
X-Ray : Not Enabled
Word Wise : Not Enabled
Sticky notes : On Kindle Scribe
Print length : 908 pages
[ad_2]
Maha A. Alabduljalil –
Excellent for a beginner
The book is clear and easy to follow, especially for a beginner like me. It had short examples for most of the cases that you might think of. I think of it as a guidance on how to learn Hadoop functionalities and classes in the right order. Yet, I can not be more precise in my review since I haven’t read another book about Hadoop. Most of my references are online, specially Yahoo site. I’m not sure how advanced it is, because I don’t have a real cluster, so I’m not sure if what is mentioned, is enough for real cluster’s problems and and configurations issues. The book also discuss other Apache projects like Hive and HBase. This is found in other books too but what’s amazing here is that all code scripts mentioned in the book are also provided to start running directly without the hassle of writing from scratch.
Chris Joslin –
Partly succeeds
Tom White certainly writes very well: this book is very readable. It is also quite comprehensive, falling somewhere between a tutorial and a reference.That being said, I was ultimately rather disappointed. First, and most importantly, it was not clear to me after reading this book how I might use Hadoop for some of my projects, or if indeed they were good candidates for MapReduce. I feel it should have been possible to provide some generic guidance. Second, some chapters are written by other authors, and these did not uniformly provide the same quality of instruction, reading occasionally like advertisements.I confess I am puzzled by the number of encapsulating and utility APIs that have grown up around Hadoop. Why do we need Pig, HBase, Hive, Zookeeper and Cascading? Apparently because (according to what I have read here), bare Hadoop is hard to program with (productively). Some indication of how these wrappers interact with each other would have been helpful.As it is, I feel LESS urge to evangelize for Hadoop having read this book. Surely not the desired effect?
F. Yang –
good book
I especially like the part talks about MapReduce, makes it easy to understand.
Arun Ramakrishnan –
great book
What I really liked most about this books was that I could read the vast majority of it straight through and enjoyed the process. Very well structured and the example surrounding weather station data was an appropriate choice to give a good perspective on most of the problems. A good mix of practical theory, examples and code snippets.
Kattamuri S. Sarma –
Three Stars
Has some good examples
JUG Lugano –
The elephant is tamed
Original review written by Paolo Canesi, JUG Lugano, […]Managing and analyzing huge data sets has become a very common problem in various areas of modern information technology, from different types of Web applications (social, financial, trading, …) to applications for analyzing scientific data.Distributed systems over a cluster of machines are almost a mandatory choice in such cases, but designing and implementing an effective solution in those areas may be troublesome and become a nightmare.The Apache Hadoop Project is an infrastructure that helps the construction of reliable, scalable, distributed systems. Mainly known for its MapReduce and distributed file system (HDFS) subprojects, it actually includes other services that complement or extend them.Tom Whites’ “Hadoop: The Definitive Guide” is an enjoyable book which fully explains these complex technologies. The book is organized in such a way that the reader is gently guided into the Hadoop ecosystem. It begins with a couple of very readable chapters as a general introduction to the problems Hadoop is meant to solve and the main solutions to them (MapReduce and HDFS), then examines closely all its aspects, often describing what really happens under the scenes, giving useful design suggestions and common pitfalls descriptions. When reading this book you won’t be overwhelmed by tons of lines of code: examples are short and yet effective.This kind of structure makes it hard to classify the book as a mere tutorial or as a real reference guide, it can be rather considered a mix of the two. If this turns out to be a positive choice in many ways, it has some drawbacks: the reader is sometimes forced to go back and forth through the chapters and has to read it almost entirely to get a full understanding. But this is perhaps the price to pay for having a fluent and pleasant reading.Let’s go quickly through the chapters:The first chapter is a brief history of Hadoop project illustrating its main characteristics and comparing them to those of others similar technologies. Chapter two is a pleasant introduction to MapReduce. The third chapter breaks the continuity of the previous one examining the Hadoop Distributed File System (HDFS subproject) in detail. Chapter four makes a step down in the abstraction layer talking about the Hadoop I/O fundamentals: data integrity, compression, serialization and data structures, explaining the design choice.Chapters five to eight are an excellent source for learning Hadoop MapReduce in depth. They cover all the aspects of it: starting from practical ones, such as how to configure, run, test and debug map reduce programs, to those more advanced and formal, like programming models, data formats, sorting and joining tools.The two following chapters list few very interesting and useful suggestions for managing and setting up a Hadoop cluster, a precious resource for administrators.Chapters eleven to thirteen are for Pig, HBase and Zookeper subprojects under the Hadoop umbrella. Despite of suffering from brevity, they are still interesting.Chapter fourteen is made for the reader not to feel alone: important case studies using Hadoop (e.g. Yahoo, and others contributions from Apache Hadoop community).My final opinion is that “Hadoop: The Definitive Guide” is a very useful resource for those who want to learn how to ride the “pachydermic” Hadoop (like a “Mahout”, perhaps?).
Three brothers –
Good book in great condition
Good book, just like brand new book without any rip. arrived fast. Great for my self study at home.