Goodreads helps you keep track of books you want to read. Want to Read saving…. Want to Read Currently Reading Read. Other editions. Enlarge cover. Error rating book.
|Published (Last):||7 November 2004|
|PDF File Size:||3.52 Mb|
|ePub File Size:||7.73 Mb|
|Price:||Free* [*Free Regsitration Required]|
Goodreads helps you keep track of books you want to read. Want to Read saving…. Want to Read Currently Reading Read. Other editions.
Enlarge cover. Error rating book. Refresh and try again. Open Preview See a Problem? Details if other :. Thanks for telling us about the problem. Return to Book Page. Abdulbasit Shaikh. This book is a user-friendly guide that covers all the necessary steps and examples related to web crawling and data mining using Apache Nutch. It is a good start for those who want to learn how web crawling and data mining is applie This book is a user-friendly guide that covers all the necessary steps and examples related to web crawling and data mining using Apache Nutch.
It is a good start for those who want to learn how web crawling and data mining is applied in the current business world. It would be an added benefit for those who have some knowledge of web crawling and data mining. Get A Copy. More Details Other Editions 2. Friend Reviews.
To see what your friends thought of this book, please sign up. Lists with This Book. This book is not yet featured on Listopia. Community Reviews. Showing Average rating 2. Rating details. Sort order. Apr 23, Emir Arnautovic rated it did not like it. This book is poorly written, badly organised, full of incorrect, incomplete and misleading statements, touching variety of topics and technologies, related but not expected to dominate in a book with this title.
View 1 comment. Jan 22, Chris rated it did not like it. While I accept that talking about how Nutch stores its crawl data is necessary, do we really need an introduction on how to install MySql and Apache Acumulo? It is even less compelling when most of the part about installing Acumulo is copied directly from the referenced blog post. The authors have, however, gone through the trouble of compiling information scattered through the documentation and various blog posts into one book.
I would like it if the book were better organized though. It feels jumpy, repetitive, and unstructured. It jumps back and forth between Nutch 1. It would probably have made more sense for the authors to split it into 2 books, one dedicated to each version that try to mash them together so haphazardly. In our age of Data Explosion it becomes increasingly appealing, if not necessary, to scout the myriad of what it looks like though shrinking World Wide Web pages.
If you even are not tasked with crawling a subset of the webpages today you may want to grab a copy of Web Crawling and Data Mining with Apache Nutch book to make you well prepared in advance. Advantageously, the book is not excessively long, so even if you are in a hurry, it will allow you to accomplish the desired scope in a short tim In our age of Data Explosion it becomes increasingly appealing, if not necessary, to scout the myriad of what it looks like though shrinking World Wide Web pages.
Advantageously, the book is not excessively long, so even if you are in a hurry, it will allow you to accomplish the desired scope in a short time.
Be aware that the book concentrates a lot on making related software communicate with each other and devotes a significant portion of it to setting things up in general so you may need to check for changes in how to integrate or install the parts in case you happen to work on newer releases of the involved software. I need to give the credits to the authors here that they have made every effort to showcast the Nutch capabilities and yet make your solution prepared to be scalable. However, the Nutch crawl optimization is for some reason is missing.
The book gladly is covering the index processing which is compulsory, but unfortunately in my opinion, does not expand enough on an a necessary part: Apache Solr. The book also covers Apache Gora, but lefts out the option to integrate with Cassandra. On the not so happy note, the book concentrates a lot on the infrastructure aspects so while reading the book I desired the authors could provide better explanations about the place of the technologies covered. At least of what Nutch is comprised of supplemented with real life usage examples, perhaps a study or two would not harm.
It also felt at the beginning like the book lacks some reader background prep steps so at times I needed to take a pause to seek some additional information. I suggest some reference would be nice to have along with glossary of terms. Nevertheless, overall, it is a good read: 4 out of 5 is my verdict. Jan 20, Chris rated it liked it. The book begins with explanation of dependencies, an overview of Apache Nutch file structure and a simple demonstration of how Nutch can crawl webpages.
Most of the book is dedicated to implementation. I'd recommend it to experienced software, information management or data analytic professionals with a strong foundation in software implementation. Overall not a bad book. I'll probably turn this into a weekend project just to get a feel for the different Apache products mentioned in this book and also to see how Nutch functions.
Feb 11, Paul added it. It is really a great book. And I get help in my project. In my project I need to crawl the web content and do the data analyst. From the book I can know how to use and integrate Nutch and Solr frameworks to implement it. If you have similar case, recommend to read this book. Readers also enjoyed.
Goodreads is hiring! If you like books and love to build cool products, we may be looking for you. About Zakir Laliwala. Zakir Laliwala. Books by Zakir Laliwala. Related Articles. San Francisco is a gold rush town. Read more Trivia About Web Crawling and No trivia or quizzes yet. Welcome back. Just a moment while we sign you in to your Goodreads account.
Web Crawling and Data Mining with Apache Nutch.
Comment 0. The first quarter of the book is largely introductory. For me, the book got a bit more interesting when it covered the Nutch Plugin architecture. The book then covers deployment and scaling. This includes detailed instructions on Hadoop installation and configuration. This is followed by a chapter on persistence mechanisms, which uses Gora to abstract away the actual storage.
Web Crawling and Data Mining with Apache Nutch
Book Review: Web Crawling and Data Mining with Apache Nutch
- HEAT AND MASS TRANSFER DATA BOOK BY C.P.KOTHANDARAMAN PDF
- ENFERMEDAD DE DEGOS PDF
- EDITOVANJE TEKSTA U PDF
- FERRAR FENTON PDF
- ARCOS BRANQUIALES PDF
- DESCARGAR LA ALEGRIA DE LEER EL ELECTROCARDIOGRAMA 3RA EDICION PDF
- HOW TO WRITE A GOOD ADVERTISEMENT VICTOR SCHWAB PDF
- APOLLONIUS OF PERGA CONICS PDF
- AMAR FAROOQUI EARLY SOCIAL FORMATIONS PDF