Stuff the internet says about data for October 10th - October 18th, 2014

Some quite interesting observations this week are that it seems MongoDB is finding it difficult to live in companies that needs scaling and flexible analytics capability. The overall trend seem to be that persistence technologies that focus on scale like Cloudera, Cassandra and HBase are complimented by analytical engines like Neo4J, ElasticSearch, Spark, Storm etc. The Swiss army knife is maybe not suitable for building skyscrapers after all.

As usual there are plenty of reminders that data is important and that organizations that does not embrace the data field is doomed. Some marginal discussions also cover methodical and organizational approaches to get the grip of the data field. This is in many cases more interesting than throwing a tool at the problem and hope for magical solutions. It will be very interesting to see what the future brings in this area.

Persistence

Cassandra error handling done right: http://www.datastax.com/dev/blog/cassandra-error-handling-done-right #nosql via @DataStax

If you are write heavy. Never shard on date or incrementing ids: http://java.dzone.com/articles/sharding-pitfalls-part-i #nosql via @dzone

Which companies have moved away from MongoDB and why? http://www.quora.com/Which-companies-have-moved-away-from-MongoDB-and-why via @Quora

Apache Kafka Integration http://www.cloudera.com/content/cloudera/en/developers/home/cloudera-labs/apache-kafka.html #bigdata #nosql via @cloudera

Getting Started with Time Series Data Modeling http://planetcassandra.org/getting-started-with-time-series-data-modeling/ #nosql #cassandra @PlanetCassandra

Facebook's greatest technical accomplishments: Consistency across data centers ++ http://www.quora.com/What-have-been-Facebooks-greatest-technical-accomplishments via @Quora #bigdata

Processing

Linkedin opens the Economic Graph challenge http://linkurio.us/linkedin-opens-economic-graph-challenge/

12 things I hate about Hadoop http://www.infoworld.com/article/2833851/application-development/12-things-i-hate-about-hadoop.html via @infoworld

".. seek out only the data you need to address it and apply sophisticated predictive and prescriptive analytics" http://data-informed.com/biggest-misconception-big-data/ via @Data_Informed

RT @foundsays: Elasticsearch from the Top Down - New article by @alexbrasetvik
Go grab a coffee and put on your diving googles! https://found.no/foundation/elasticsearch-top-down/

RT @jboner: Wired on the #Spark world record: 'Startup Crunches 100 Terabytes of Data in a Record 23 Minutes': http://www.wired.com/2014/10/startup-crunches-100-terabytes-data-record-23-minutes/

Mobile Video Big Data Architecture with Spring XD/Hadoop/HAWQ/Redis: Measuring Live Usage http://shar.es/1m4vD0

RT @monowai: #Ebola tweets via @FlockDataCom. Thanks @halffinn. #NLP co-occur relationships vized in #D3js via #Neo4j http://t.co/Q4OU4508NV

RT @KirkDBorne: As #IoT looms, survey finds growing urgency among companies to adopt #BigData and Predictive #Analytics http://buff.ly/1s7RLjg

Presentation

Use data or be data: http://radar.oreilly.com/2014/10/use-data-or-be-data.html #bigdata via @BigDataStartups

The Future of Graph Visualization http://keylines.com/events-and-hangouts/future-graph-visualization (would have been great to be there!)

Facebook's #bigdata mistake https://www.linkedin.com/pulse/article/20141015195513-247423-facebook-s-big-data-mistake?trk=nus-cha-roll-art-title (but honestly: You need to test your data assumptions) via @rakeshlobster

Using Data for Good: Jake Porway Talk: http://blog.mortardata.com/post/99997015326/using-data-for-good-jake-porway-talk-video-slides #bigdata

Moneyball: How businesses are using data to outsmart their rivals http://www.cnn.com/2014/10/13/business/moneyball-businesses-outsmarting-rivals/index.html

Never judge a visualization by its bubbles: http://www.thefunctionalart.com/2014/10/never-judge-visualization-by-its-bubbles.html

Author

Tarjei Romtveit

Co-founder of Monokkel with solid experience in systems design, data management, data analysis, software development and agile processes.