Stuff the internet says about data for February 1st - March 1st, 2015

Busy month, but here are some articles, tutorials and blog posts that we shared the last month!

Persistence

Good read: Understanding Transactional #NoSQL systems http://learnallyeah.com/understanding-transactional-nosql-systems/

Watch the "Data Exploration with ElasticSearch" meetup online: http://blog.monokkel.io/watch-the-elasticsearch-meetup-online/ #bigdata #processing #nosql

Guaranteeing exactly-once load semantics in moving data from Kafka to HDFS http://blog.thedatateam.in/2015/02/guaranteeing-exactly-once-load.html?m=1 #hadoop #persistence

Building a topic graph with Prismatic Interest Graph API at Mark Needham http://www.markhneedham.com/blog/2015/02/13/neo4j-building-a-topic-graph-with-prismatic-interest-graph-api/ @neo4j #persistence

Well written article about LSM trees ! http://www.benstopford.com/2015/02/14/log-structured-merge-trees/ #persistence #bigdata

A well written HBase tutorial http://www.inspiredtechies.com/hbase-tutorial/ #nosql #bigdata #persistence

How To Choose A #NoSQL Database? http://www.nextbigwhat.com/how-to-choose-nosql-database-297/ #bigdata @nextbigwhat

#NoSQL Performance: Measuring Couchbase Performance with Couchdoop http://blog.bigstep.com/big-data-performance/connecting-hadoop-couchbase-couchdoop-view-performance/ #persistence

RT @PlanetCassandra: .@Alcatel_Lucent migrates from #Oracle to Apache #Cassandra for a solid, scalable foundation http://planetcassandra.org/blog/interview/alcatel-lucent-migrates-from-oracle-to-apache-cassandra-for-a-solid-scalable-foundation/

WANalytics: Analytics for a geo-distributed, data intensive world http://blog.acolyer.org/2015/02/03/wanalytics-analytics-for-a-geo-distributed-data-intensive-world/ #persistence #bigdata

RT @MaxCRoser: From 10 Million Dollar to 9 cents! The price of 1GB Hard Drive since 1950 (Source: http://bit.ly/1E4xzDD) http://bit.ly/1E4xzDD

Processing

12 common problems in Data Mining http://www.bigdata-madesimple.com/12-common-problems-in-data-mining/ #bigdata

Knowledge and Common Knowledge in a Distributed Environment http://blog.acolyer.org/2015/02/16/knowledge-and-common-knowledge-in-a-distributed-environment/ #bigdata

RT @kdnuggets: #Hadoop Creator: If You Want To Succeed With #BigData, Start Small http://buff.ly/1MXf35Y #StrataHadoop #Success http://t.co…

Ubuntu, Hortonworks and Microsoft = #BigData Hosted Solution https://insights.ubuntu.com/2015/02/19/ubuntu-hortonworks-and-microsoft-big-data-hosted-solution/ #processing

#BigData #Processing in Spark http://horicky.blogspot.no/2015/02/big-data-processing-in-spark.html?m=1

Distributed data #processing with Apache Flink http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html?m=1 #bigdata

RT @foundsays: New article about common use cases for #Elasticsearch, with tons of references to resources: https://www.found.no/foundation/uses-of-elasticsearch/

Google open sources a mapreduce framework for C++ https://gigaom.com/2015/02/18/google-open-sources-a-mapreduce-framework-for-c/ #processing

I Thought Of Sharing These 7 Machine Learning Concepts With You http://www.bigdataexaminer.com/i-thought-of-sharing-these-7-machine-learning-concepts-with-you/ #bigdata #processing

#Processing frameworks for Hadoop http://radar.oreilly.com/2015/02/processing-frameworks-for-hadoop.html #bigdata

Monokkel is running an #elasticsearch bootcamp in Oslo: http://blog.monokkel.io/monokkel-is-running-an-elasticsearch-boot-camp-in-2015/

A/B testing: An online formula for Bayesian testing https://salasboni.wordpress.com/2015/02/06/online-formula-bayesian-ab-testing/ #processing #machinelearning

#BigData Lessons From Netflix http://insights.wired.com/m/blogpost?id=6544125%3ABlogPost%3A82535 #processing

Practical fault detection & alerting. You don't need to be a data scientist http://dieter.plaetinck.be/practical-fault-detection-alerting-dont-need-to-be-data-scientist.html #processing #machinelearning

Liquid: Unifying nearline and offline #bigdata integration http://blog.acolyer.org/2015/02/04/liquid-unifying-nearline-and-offline-big-data-integration/ #processing

Switching Careers: From Java to Big Data / Hadoop. http://www.edureka.co/blog/switching-careers-from-a-java-to-big-data-hadoop/ #bigdata

11 interesting #BigData case studies in Telecom http://www.bigdata-madesimple.com/11-interesting-big-data-case-studies-in-telecom/ #processing #strategy

Paxos, a really beautiful protocol for distributed consensus http://www.goodmath.org/blog/2015/01/30/paxos-a-really-beautiful-protocol-for-distributed-consensus/ #bigdata #processing

The Big-Data Tool Spark May Be Hotter Than Hadoop, But It Still Has Issues http://globalbigdataconference.com/news/26988/the-big-data-tool-spark-may-be-hotter-than-hadoop-but-it-still-has-issues.html #bigdata

Year 2014 in Review as Seen by a Event Detection System http://www.kdnuggets.com/2015/01/year-2014-review-event-detection-system.html #processing

Kafka @ Linkedin. Current and future http://engineering.linkedin.com/kafka/kafka-linkedin-current-and-future #processing

RT @KirkDBorne: Practical illustration of #MapReduce operations on data: http://bit.ly/1CuYPvX #abdsc #BigData #Analytics http://bit.ly/1CuYPvX

Presentation

RT @Data_Informed: Capturing the Business Value of #BigData in Real Time http://hubs.ly/y0s49L0 http://hubs.ly/y0s49L0

Interesting #visualisations: The Impact of Vaccines http://graphics.wsj.com/infectious-diseases-and-vaccines/

RT @keen_io: Open source dashboard templates for bootstrap https://cards.twitter.com/cards/5pcbj1/8d76

Other

Adopting Microservices at Netflix: Lessons for Architectural Design http://nginx.com/blog/microservices-at-netflix-architectural-best-practices/

RT @KirkDBorne: #BigData #Analytics startup @SqrrlData raises $7M: http://m.bizjournals.com/boston/blog/startups/2015/02/big-data-analytics-startup-sqrrl-raises-7m.html http://m.bizjournals.com/boston/blog/startups/2015/02/big-data-analytics-startup-sqrrl-raises-7m.html

Author

Tarjei Romtveit

Co-founder of Monokkel with solid experience in systems design, data management, data analysis, software development and agile processes.