Top Posts Tagged with #tuplejump

What Apple's 2016 acquisitions tell us about 2017 and beyond

What Apple’s 2016 acquisitions tell us about 2017 and beyond

When you look at Apple’s 2016 acquisitions, “the focus for Apple seems centered around artificial intelligence, augmented reality, enterprise and education,” Carolina Milanesi writes for Tech.pinions.

“I have mentioned before how my relationship with Siri has been improving over time. This is good and bad at the same time. Good because I appreciate it. Bad because I want more,”…

View On WordPress

#Flyby Media #Gliimpse #LearnSprout #LegbaCore #Tuplejump #Turi

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Denumită Tuplejump, firma fondată în India avea ca obiect principal de activitate suplificarea tehnicilor de gestionare a datelor...

#tuplejump

Apple acquires Tuplejump, a machine learning company

Apple has acquired Tuplejump, a Machine learning startup. Founded in 2013, Tuplejump is a Hyderabad-based firm established by Rohit Rai, Deepak Alur and Satyaprakash Buddhavarapu. It optimises and unifies workflows for analysts, engineers, and data scientists by bringing in solutions concerning data processing and reporting. It also enables firms in saving and processing big data. The website of…

View On WordPress

#acquisition of Apple #Apple acquisition #Apple News #Apple Tuplejump #Tuplejump #Tuplejump deal #Tuplejump News

Apple Acquires Machine Learning Startup Tuplejump

Apple recently acquired its third machine learning company since 2015, purchasing India-based company Tuplejump, reports TechCrunch. Tuplejump focused on simplifying data management techniques and creating tools to make it easy to deal with large quantities of data. From an archive of the now-defunct Tuplejump website:

A few years ago people realized that the volume of data that businesses…

View On WordPress

#apple #Tuplejump

Ringmaster - The Cassandra backed cluster resource manager

Today we introduce Ringmaster, a masterless, Cassandra backed cluster manager currently capable of running Spark applications on it but extendable to support other frameworks and applications.

A couple of years ago we started moving beyond Hadoop. We built the Tuplejump platform integrating Cassandra and Spark. In development we use Spark Standalone mode while in production we run Spark on Apache Mesos. This setup works great, but we want to take a step further towards our original goal of one click and fast Big Data cluster deployment. In current setup we still have additional moving parts, Spark server, mesos, zookeeper and tools to deploy, monitor and manage them. With Ringmaster we move towards removing these.

Ringmaster is a very simplified take on Mesos cluster manager which currently targets on running Spark jobs. We will be extending it to support Hydra, our stream processing engine, but we don’t intend it to replace Mesos as a generic cluster management system and hence it will always remain a small and easy to manage code base. Moreover it builds on Cassandra’s infrastructure and functionality which is proven and battle hardened with years in production.

Before we take a dive into the workings of ringmaster, let’s take a quick look at Cassandra cluster. The figure below shows a typical Cassandra ring, with tokens assigned T-0, T-1, … T11.

Cassandra ring with Tokens T-0 to T-11

In this ring of 12 cassandra will first calculate the hash token for a row and place it on the corresponding node. The row will be replicated to other nodes based on the replication factor and replication strategy. To read more about replication in Cassandra and understand how various strategies affect the nodes where the data is replicated read up here.

Let us take the simplest of the cases. In this ring let us suppose we are using the simple replication strategy and replication factor of 3, then for a row that hashes in the range between T-0 and T-1 then the row will first be persisted on the zeroth node and will be replicated on the first and second node.

Simple Replication for Factor 3 on Node T-0

To setup Spark in this cluster we move from simple replication strategy to network topology replication strategy placing the nodes that will run analytics tasks in a different virtual datacenter and installing Spark workers on them. The first data center (DC1) will be used by the high performance realtime read/write clients while the second (DC2) will exclusively be kept for Spark and Shark.

Let us assume we put every 4th node in DC2, the ring will look like this.

Multi DC Cassandra Ring

In this as we see The Row R1 which hashed to in the range of T-0, it is saved on node T-0 replicated clockwise till it gets replicated on node T-3 in DC2. This way all the data written to nodes in DC1 gets replicated to nodes in DC2.

Using this mechanism we ensure two things, first that the read/write performance of the clients is not affected by the analytic jobs and on analytics side we ensure data locality for the spark workers.

Now the complications arise when we add the Spark master and the backup and zookeeper ensemble to enable master switchover. Our current production clusters look somewhat like this (removing Mesos to get some clarity).

Current Tuplejump Cluster Setup

As you can see this requires us maintaining 5 additional servers with different software apart from our Cassandra ring and even in the cassandra ring we have different pieces to be installed in different nodes. In DC1 it is only Cassandra while DC2 requires Cassandra and Spark Workers to be installed and Spark workers configured to use the spark masters via zookeeper. This is where cluster deployment and configuration management systems business thrives and we for our needs use Puppet, Chef, Salt Stack, etc. depending on what the client’s team is familiar with or uses internally.

Let’s quickly review the problems with our current setup,

Additional infrastructure

Additional software components and configuration

Complex deployment

Removing zookeeper and backup server create a SPOF

Sometime back we through our partnership with Imaginea built a PoC of a masterless Spark Standalone cluster and have already created a pull request to Apache spark for the same (https://github.com/apache/incubator-spark/pull/172). This was the first cut of Ringmaster or project-rm. This uses Akka Cluster as the backbone and Actor messaging to communicate amongst the workers. The system is masterless as your application can connect to any of the nodes and it will create a dedicated scheduler for it. The state of the scheduler is replicated over the cluster and a cluster listener initiates another actor to take over if one of the nodes running a scheduler go down.

Akka powered RM

The scheduler is modified spark scheduler which communicates to the worker actors using akka messaging sending resource requests, receiving offers and triggering tasks on available workers.

What the diagram doesn’t show is additional actors, Cluster monitor and State store which resply, monitor the cluster node membership messages and store and replicate the scheduler’s state. This resource manager solves 2 of our issues with the current setup,

There is no single point of failure, if a node dies the client can connect to another node and the scheduler will run there acting as the new master

Since the scheduler is co-located on the worker node and doesn’t need ZK for reliability we don’t require the additional infrastructure

We still need to deploy multiple software components and the deployment though simplified a lot than earlier (configuring Akka cluster is the only requirement in addition to configuring Cassandra) it is still more than just Cassandra. Also the functionality built into Cassandra and the functionality we use from Akka-Cluster (Cluster messaging, node membership, replicated state store, etc.) overlaps. And thus began the next phase for Ringmaster, we started replacing the functionality we built over akka cluster with the one found in Cassandra. We still use akka actors, but don’t use the akka cluster. Now we run Ringmaster as a javaagent in the Cassandra JVM and so have access to all the internals of Cassandra. Major functionality we used from Cassandra was,

Akka cluster node membership replaced by cassandra peer membership. One akka actor on each node listens to the peer state messages in Cassandra and takes an action on it if it is the “clockwise adjacent node” in the datacenter to the node going down.

The remote actor communication is done through custom defined Cassandra messages.

The scheduler state is stored to Cassandra and it takes care of replication.

Cassandra Backed RM

Once Ringmaster goes in production, this is how our cluster will look. No additional moving parts to the engine, just the Cassandra cluster the javaagent on all of them. The agent gets activated only on the nodes with Data Center marked as analytics or nodes marked with Analytics role. More on that next time.

We will be soon releasing the early access to Ringmaster for you to dig your teeth in, followed by the codebase and general availability as we do with all our other components. In the meanwhile please let the comments and queries flow in.

#cassandra #akka #tuplejump #ringmaster #spark #mesos #hadoop

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Introducing Calliope

We at tuplejump have been using the combination of Spark and Cassandra for a while. After trying many different stacks, with hadoop, twister, using QFS and many other permutations and combinations, we fell in love with Spark+Cassandra. They are magical together…

Cassandra gives you the ability to store your data in structured/semi-structured way and rocks on performance. Spark complements it by giving a high data processing and an analytics system to process this data. Though the ‘tuplejump platform’ is still in development, we have decided to start releasing our core runtime engine in open source for everyone to get a jumpstart with their big data dreams.

As part of this effort, we started with a limited release of Calliope, our library to assist in accessing Cassandra from Spark. We want to validate the API before releasing it out. And we have benefitted a lot from the feedback. It has also helped us in laying out our roadmap for the library.

Today we want to expand the limited release and invite more people to try it out.

You can find more details about the project here - http://tuplejump.github.io/calliope/

What is Calliope today -

A library that simplifies accessing data from Cassandra in Spark and write out the RDDs from Spark to Cassandra. We also have a Spark streaming to Cassandra example.

What’s cooking in our labs,

1. API improvement to the existing library 2. Shark integration (so we can also do sql2rdd in Spark) 3. A DFS built on Cassandra, so we don’t need HDFS/Hadoop anymore! 4. Streaming Cassandra mutations to Spark Stream

#tuplejump #spark #cassandra

What Apple's 2016 acquisitions tell us about 2017 and beyond

What Apple’s 2016 acquisitions tell us about 2017 and beyond

“I have mentioned before how my relationship with Siri has been improving over time. This is good and bad at the same time. Good because I appreciate it. Bad because I want more,”…

View On WordPress

#Flyby Media #Gliimpse #LearnSprout #LegbaCore #Tuplejump #Turi

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Denumită Tuplejump, firma fondată în India avea ca obiect principal de activitate suplificarea tehnicilor de gestionare a datelor...

#tuplejump

Apple acquires Tuplejump, a machine learning company

View On WordPress

#acquisition of Apple #Apple acquisition #Apple News #Apple Tuplejump #Tuplejump #Tuplejump deal #Tuplejump News

Apple Acquires Machine Learning Startup Tuplejump

A few years ago people realized that the volume of data that businesses…

View On WordPress

#apple #Tuplejump

Ringmaster - The Cassandra backed cluster resource manager

Today we introduce Ringmaster, a masterless, Cassandra backed cluster manager currently capable of running Spark applications on it but extendable to support other frameworks and applications.

Before we take a dive into the workings of ringmaster, let’s take a quick look at Cassandra cluster. The figure below shows a typical Cassandra ring, with tokens assigned T-0, T-1, … T11.

Cassandra ring with Tokens T-0 to T-11

Simple Replication for Factor 3 on Node T-0

Let us assume we put every 4th node in DC2, the ring will look like this.

Multi DC Cassandra Ring

Current Tuplejump Cluster Setup

Let’s quickly review the problems with our current setup,

Additional infrastructure

Additional software components and configuration

Complex deployment

Removing zookeeper and backup server create a SPOF

Akka powered RM

The scheduler is modified spark scheduler which communicates to the worker actors using akka messaging sending resource requests, receiving offers and triggering tasks on available workers.

There is no single point of failure, if a node dies the client can connect to another node and the scheduler will run there acting as the new master

Since the scheduler is co-located on the worker node and doesn’t need ZK for reliability we don’t require the additional infrastructure

The remote actor communication is done through custom defined Cassandra messages.

The scheduler state is stored to Cassandra and it takes care of replication.

Cassandra Backed RM

#cassandra #akka #tuplejump #ringmaster #spark #mesos #hadoop

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Introducing Calliope

Today we want to expand the limited release and invite more people to try it out.

You can find more details about the project here - http://tuplejump.github.io/calliope/

What is Calliope today -

A library that simplifies accessing data from Cassandra in Spark and write out the RDDs from Spark to Cassandra. We also have a Spark streaming to Cassandra example.

What’s cooking in our labs,

#tuplejump #spark #cassandra

Top Posts Tagged with #tuplejump | Tumlook

Trending Tags

Last Seen Tags

#tuplejump

Trending Tags

Last Seen Tags

#tuplejump