Stuff the internet says about data for April 1st - May 1st, 2015

Happy reading!

Persistence

#Neo4j Text Classification API http://graphify.github.io/graphify/ #processing

Introducing the New Cypher Query Optimizer http://java.dzone.com/articles/introducing-new-cypher-query #neo4j #processing

Thinking outside the Graph: Data #Virtualization and Graph Databases http://www.datavirtualizationblog.com/thinking-outside-the-graph-data-virtualization-and-graph-databases/ #bigdata

Presentation

Explained Visually is an experiment in making hard ideas intuitive [http://setosa.io/ev/] (http://setosa.io/ev/)

#BigData: Amazing numbers 2015! https://www.linkedin.com/pulse/big-data-amazing-numbers-2015-bernard-marr

Processing

Project Tungsten: Bringing Spark Closer to Bare Metal https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html #processing #bigdata

Can Spark Streaming survive Chaos Monkey? http://techblog.netflix.com/2015/03/can-spark-streaming-survive-chaos-monkey.html?m=0 #processing

Apache Hadoop 2.7.0 Released! http://hortonworks.com/blog/apache-hadoop-2-7-0-released/

The Open Road to Galene, LinkedIn’s Search Architecture http://thenewstack.io/the-open-road-to-galene-linkedins-new-search-architecture/ #processing

Apache Kafka is the circulatory system in use at LinkedIn. http://pandawhale.com/post/60475/apache-kafka-is-the-circulatory-system-in-use-at-linkedin #processing

Announcing Apache Spark, Now GA on Hortonworks Data Platform http://hortonworks.com/blog/announcing-apache-spark-now-ga-on-hortonworks-data-platform/ #processing

Machine learning diagnoses Parkinson's http://medicalphysicsweb.org/cws/article/research/60766 #bigdata

How many shards should Elasticsearch indexes have? http://cpratt.co/how-many-shards-should-elasticsearch-indexes-have/ #processing

Real time full text search with luwak and samza : http://blog.confluent.io/2015/04/13/real-time-full-text-search-with-luwak-and-samza/ #enterprisesearch #processing #bigdata

Breaking through the speed barrier with Spark Streaming – Part 1 http://blog.triggar.com/breaking-through-the-speed-barrier-with-spark-streaming/ #bigdata #processing

How PayPal beats the bad guys with machine learning http://www.infoworld.com/article/2907877/machine-learning/how-paypal-reduces-fraud-with-machine-learning.html #bigdata

Distributed lambda architecture http://www.cakesolutions.net/teamblogs/distributed-lambda-architecture #processing #bigdata

Lambda Architecture explained http://pandawhale.com/post/60473/lambda-architecture-explained #bigdata #hadoop

Announcing Apache Ambari 2.0 http://hortonworks.com/blog/announcing-apache-ambari-2-0/ #hadoop

Apache Ambari paves path to easier #hadoop: http://www.infoworld.com/article/2907535/hadoop/apache-project-ambari-paves-path-to-easier-hadoop.html #bigdata

Running PageRank Hadoop job on AWS Elastic MapReduce http://java.dzone.com/articles/running-pagerank-hadoop-job #bigdata by @paskal_1973

The new age of algorithms http://banknxt.com/49972/new-age-of-algorithms/ #bigdata #fintech

Overcoming Missing Values In A Random Forest Classifier http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/

Other

Consul Service Discovery and Health For Microservices Architecture Tutorial http://www.mammatustech.com/consul-service-discovery-and-health-for-microservices-architecture-tutorial

Is Smart Data Rendering #BigData Obsolete? http://blog.surveyanalytics.com/2015/04/is-smart-data-rendering-big-data.html?m=1

RT @Sve_Sic: #Analytics and information management roles rise and fall

Code focused development. Redefined https://code.visualstudio.com/

Author

Tarjei Romtveit

Co-founder of Monokkel with solid experience in systems design, data management, data analysis, software development and agile processes.