Pages

Oct 22, 2014

Spark with YARN on an Amazon EMR cluster updated

Amazon updated Spark on an EMR cluster to version 1.x.

Spark graduated to 1.x thereby guaranteeing stability of its core API for all 1.x releases, Shark has been deprecated in favor of Spark SQL, and Spark can be run on top of YARN (the resource manager for Hadoop 2). In light of these changes, we have revised the bootstrap action to install Spark 1.x on our Hadoop 2.x AMIs and run it on top of YARN. The bootstrap action also installs and configures Spark SQL, Spark Streaming, MLlib, and GraphX.


[source]

Oct 21, 2014

Diseases that Kill Russian

Today I did some visualization according to RosStat (Data source is the Federal State Statistics Service.) and decided to post it.

The first chart shows the Incidence of major classes of diseases 2012.

The second chart shows the Incidence of major classes of diseases 2012 (per 1,000 people).


Diagram below shows the statistics on diseases in chronological order. The figure shows that the ratio practically does not change with time.




Tableau Software is “Certified on Spark”

Tableau’s integration with Spark brings tremendous value to the Spark community – users can visually analyze their data without writing a single line of Spark SQL code.

That’s a big deal because creating a visual interface to your data expands the Spark technology beyond data scientists and data engineers to all business users. The Spark connector takes advantage of Tableau’s flexible connection architecture that gives customers the option to connect live and issue interactive queries, or use Tableau’s fast in-memory database engine.


[source]