Stuff the internet says about data for October 26th - November 23rd, 2014

There have been some quite interesting articles published in November in the data space. I would definitely highlight the Ford story if you were interested in good cases and implementation strategies for how to make your company a data champion. On the technical side Neo4J is continuing to make its mark and new applications seem to pop up every day or two.

Methods and strategy

How Ford Uses Data Science: Past, Present and Future http://dataconomy.com/how-ford-uses-data-science-past-present-and-future/ #bigdata

Forget '#BigData.' Beware 'Little Data' -- and the Horrors of TMI
http://adage.com/article/the-media-guy/forget-big-data-beware-data-horrors-tmi/295575/ (Some truth in this)

Persistence

Deep Learning Sentiment Analysis for Movie Reviews using @neo4j http://pandawhale.com/post/50813/deep-learning-sentiment-analysis-for-movie-reviews-using-neo4j

The eBay Secret to Database Scaling http://java.dzone.com/articles/ebay-secret-database-scaling #nosql #mongodb

RT @neo4j: Great presentation @GitHub Event Archive import to Neo4j by @ikwattro https://github.com/ikwattro/gh4j, http://slideshare.net/christophewillemsen/github-eventsneo4j

Why I love databases: https://medium.com/@jeeyoungk/why-i-love-databases-1d4cc433685f #nosql

Great read: Data replication in #nosql databases explained: http://planetcassandra.org/data-replication-in-nosql-databases-explained/

The Netflix Tech Blog: Introducing Dynomite - Making Non-Distributed Databases, Distributed: http://techblog.netflix.com/2014/11/introducing-dynomite.html #nosql

Why your old SAN does not scale : http://www.infoworld.com/article/2839997/infrastructure-storage/why-your-old-san-doesnt-scale.html via @infoworld #bigdata

Review: Connect your data better with @Neo4j http://www.infoworld.com/article/2839445/nosql/review-neo4j-connect-data-better.html#twitter via @infoworld

Sharding Pitfalls Part III: Chunk Balancing and Collection Limits http://java.dzone.com/articles/sharding-pitfalls-part-iii #nosql

RT @databasetube: Comparing Riak & Cassandra NoSQL Databases http://www.databasetube.com/nosql/dynamic-dynamos-comparing-riak-and-cassandra/ #nosql #database

Processing

Understand Your Problem and Get Better Results Using Exploratory Data Analysis http://machinelearningmastery.com/understand-problem-get-better-results-using-exploratory-data-analysis/ #bigdata

Google Mines Gmail for #BigData Gold https://medium.com/@jeffgould/google-mines-gmail-for-big-data-gold-cea42e9b88ee

Hadoop Vendor Hortonworks Has Filed For an IPO http://dataconomy.com/hadoop-vendor-hortonworks-has-filed-for-an-ipo-looks-to-successful-trading-offering-impetus-to-all-round-growth/ #bigdata

Start with Good Science on Good Data, Then we'll Talk #BigData http://www.datasciencecentral.com/m/blogpost?id=6448529%3ABlogPost%3A220775

An Introduction to Minimum Viable Architecture http://www.infoq.com/news/2014/11/minimum-viable-architecture via @InfoQ

Big Data Survey: Trouble Brewing For IT http://www.informationweek.com/big-data/big-data-analytics/big-data-survey-trouble-brewing-for-it/d/d-id/1317354 via @yeomantechnolog

Flafka: Apache Flume Meets Apache Kafka for Event Processing http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/ #bigdata

Sentiment analysis is largely meaningless. These firms (and Watson) are trying to change that. http://www.fastcompany.com/3037915/the-problem-with-sentiment-analysis #bigdata

Meaning of: Using Apache Spark and Neo4j for #BigData Graph Analytics: http://www.kennybastani.com/2014/11/using-apache-spark-and-neo4j-for-big.html

@buffer’s New Data Architecture: Analyze 500 Million Records in Seconds https://overflow.bufferapp.com/2014/10/31/buffers-new-data-architecture/

ElasticSearch and Apache Spark integration: http://blog.qbox.io/elasticsearch-in-apache-spark-python via @sloanahrens

Text Analytics and Natural Language Processing in the Era of #BigData http://blog.pivotal.io/data-science-pivotal/features/text-analytics-and-natural-language-processing-in-the-era-of-big-data

Presentation

The One Language A Data Scientist Must Master http://dataconomy.com/the-one-language-a-data-scientist-must-master/ #bigdata

Ebola and #bigdata waiting on hold http://www.economist.com/news/science-and-technology/21627557-mobile-phone-records-would-help-combat-ebola-epidemic-getting-look?frsc=dg%7Cd

Maps of most unusual sort: http://www.kidsdiscover.com/teacherresources/maps-of-a-most-unusual-sort/ #visualization

When Our Brain Uses Big Data to Overfit Theories https://medium.com/@chris_bour/when-our-brain-uses-big-data-to-overfit-theories-3e5f25dc4cbc

Visualised: how Ebola compares to other infectious diseases http://wp.me/p2sZp-1Lq via @wordpressdotcom

Author

Tarjei Romtveit

Co-founder of Monokkel with solid experience in systems design, data management, data analysis, software development and agile processes.