The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project
This post was first published on the Apache Software Foundation Blog.
World's first schema-free SQL query engine brings self-service data exploration to Apache Hadoop™
Forest Hill, MD –02 December 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 200 Open Source projects and initiatives, announced today that Apache™ Drill™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.Â
Apache Drill is the world's first schema-free SQL query engine that delivers real-time insights by removing the constraint of building and maintaining schemas before data can be analyzed. Drill users can run interactive ANSI SQL queries on complex or constantly evolving data including JSON, Parquet, and HBase without ever worrying about schema definitions. As a result, Drill not only enables rapid application development on Apache Hadoop™ but also allows enterprise BI analysts to access Hadoop in a self-service fashion.Â
"Apache Drill's graduation is a testament to the maturity of the technology and a strong indicator of the active community that develops and supports it," said Jacques Nadeau, Vice President of Apache Drill. "Drill's vibrant community ensures that it will continue to evolve to meet the demands of self-service data exploration use cases."Â
While providing faster time to value from data stored in Hadoop, Drill also reduces the burden on IT developers and administrators who prepare and maintain datasets for analysis. Analysts can explore data in real-time, pull in new datasets on the fly, and also use traditional BI tools to visualize the data easily – all by themselves.Â
Inspired by Google's Dremel (an academic paper on interactive analysis of Web-scale datasets), and a vision to support modern big data applications, Drill entered the Apache Incubator in August 2012. The project currently has code contributions from individual committers representing MapR, LinkedIn, Hortonworks, Pentaho, and Cisco, among others.Â
"We see the Apache Top-Level Project status as a major milestone for Drill. With a growing user base and diverse community interest, we are excited that Drill will indeed be a game changer for Hadoop application developers and BI analysts alike," said Tomer Shiran, member of the Apache Drill Project Management Committee.Â
Availability and Oversight
As with all Apache products, Apache Drill software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Drill, visit http://drill.apache.org and https://twitter.com/ApacheDrill
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than two hundred leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.Â
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
âś“ Live Streamingâś“ Interactive Chatâś“ Private Showsâś“ HD Quality
Anya is LIVE right now
FREE
Free to watch • No registration required • HD streaming
How to Turn Raw Data from Yelp into Insights in Minutes with Apache Drill
This blog first appeared on MapR.com
In this blog post, I want to briefly discuss one of the key use cases for Drill: exploring and analyzing the raw data coming into Hadoop/NoSQL system, using SQL.
Wait...isn’t this the purpose of all the SQL-on-Hadoop systems out there?
Yes.Â
However, the key difference is Drill’s agility and flexibility. Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low latency performance at scale, Drill allows users to analyze the data without any ETL or upfront schema definitions. The data could be in any file format such as text, JSON or Parquet. Data could have simple types such as string, integer, dates or more complex multi-structured data such as nested maps and arrays. Data could be any file system, whether local or distributed, such as HDFS, MapR FS or S3. Drill, with its “no schema” approach, lets you get value from your data in just a few minutes.
Let’s walk through this with a quick example activity. The publicly available dataset used for this example is downloaded from Yelp (business reviews) and is in JSON format.
To continue reading the rest of this blog post, please click here.Â
Gaining Insight into Data with Apache Drill and MicroStrategy Analytics Desktop
Drill enables self-service data discovery with data stored in Hive, JSON, CSV and other formats. This shows how to use Drill with MicroStrategy and a demo of solving a sales problem.
Click the Test button, it may take a while the first time but it should come back with Success. Don't click the Explore button or try to look for tables or columns, there is an Unsupported Operation in a few of the JDBC calls. The reason I've found is usually that a Drill class has subclassed an Avatica class and hasn't overridden the methods that PDI is calling. This is common with newer technologies that provide JDBC drivers; the JDBC API is huge so many young drivers don't implement all the methods.
6) Click OK to exit the Database Connection dialog
7) Enter your SQL query in the Table Input step
8) Click Preview or run the transformation
I was able to run queries like the ones in the wiki (see link above) as well as things like:Â
SELECT GENDER, AVG(salary) AS AVG_SALARY FROM cp.`employee.json` GROUP BY GENDER
I haven't tried this with anything but Table Input. In my experience if I am getting UnsupportedOperationExceptions in the Database Connection Dialog, I won't get very far with other Pentaho tools. This is due to the lack of implemented methods in the driver. As a Pentaho employee, I've done a few things to "fix" these on the fly. I looked into doing the same for Drill while trying to create a plugin for it, but I spent too many hours in Dependency Hell and eventually gave up (so far) trying to embed a Drill client/instance in a PDI plugin. Now that I have it working with Drill in Distributed mode, perhaps I will give it another try. If I can get around the classloading / service locator problems with Jersey, perhaps I'll achieve my end goal :)
Cheers!
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
âś“ Live Streamingâś“ Interactive Chatâś“ Private Showsâś“ HD Quality
Anya is LIVE right now
FREE
Free to watch • No registration required • HD streaming
This blog post was originally published on the MapR blog.
Q & A from “The Future of Hadoop Analytics: Total Data Warehouses and Self-Service Data Exploration” Webinar
The recent MapR webinar titled “The Future of Hadoop Analytics: Total Data Warehouses and Self-Service Data Exploration” proved to be a highly informative, in-depth look at the future of data warehouses and how SQL-on-Hadoop technologies will play a pivotal role in those settings. Matt Aslett, Research Director for 451 Research, along with Apache Drill architect Jacques Nadeau, discussed what lies ahead for enterprise data warehouse architects and BI users in 2015 and beyond.
If you missed the webinar, you can watch the replay here. Following are the answers to the questions that were asked at the end of the webinar:
Q:Â How is Apache Drill different from SQL-on-Hadoop systems?
Jacques: Traditional SQL and Hadoop systems have been very focused on trying to make Hadoop look like an RDBMS. You start out by sending out schema, and then you query with a SQL-like language. Many of them are not fully embracing SQL as a standard. At the core of it, they’re trying to make it look like a database again.
That’s where Drill is very different. Drill is focused on providing an additional level of flexibility so that users can interact with the new types of data without making it look like a database first. If you think about the chart that Matt showed early on, he talked about users who were replacing their existing data warehouses, and then there are users who are trying to add Hadoop for new uses cases. By and large, traditional SQL and Hadoop is really focused on trying to replace your existing data warehouse. While Drill is fully capable of supporting that use case, Drill is also focused on enabling a lot of new use cases that you traditionally couldn’t do without a lot of IT involvement.
Matt:Â And the way in which Hadoop is going to complement the existing analytic databases is not just by being another platform for storing more data, but for opening up new approaches to engaging with that data. Certainly from our analytics coverage, we see a lot of interest in self-service analytics and self-service integration. Hadoop and schema-on-read will be key to opening up those opportunities to get additional value from that data.
Jacques: The key to Hadoop, like any parallel system, is the ability to scale out as you need to in order to support a certain quantity of data at a certain speed. Drill actually sidesteps using MapReduce altogether. It maintains its own execution engine that allows it to achieve very low latency. That being said, if you have one computer and you have seven terabytes of data, you’re probably not going to get your answer back very quickly, so sizing is also a consideration. The key difference is that Drill allows you to analyze this semi-structured without additional work. The reason that Drill allows you to do that is because you may not be aware of the data patterns (there are typically patterns inside the data). Drill does a good job of analyzing the data as it’s coming through, and re-compiling runtime compilation code to process that code very quickly, just as if we had known the schema ahead of time. Then we simply recompile as we see changes in that schema. Drill can not only provide users with the highest level of flexibility, but it also focuses on providing performance at the highest levels. In addition, Drill is very much in parity with the fastest of the other SQL-on-Hadoop solutions.
Q: Do you have a query plan optimizer?
Jacques: We have an advanced parallel query optimizer that allows Drill to make smart decisions about how to place the work as well as how to replan the query. With any kind of Hadoop system where you’re going to have minimal statistics, the optimizer can’t do quite as well as if you had full statistics on the data. However, Drill does generate full statistics in providing those as well, in order to do more advanced levels of optimization. Beyond that, Drill provides a substantial amount of extensibility so that you can customize your query workloads to make sure that it goes fast and to really control how that query is going to be operated.
Q: What are the use cases for Apache Drill?
Jacques: It runs the gamut. There are people who are saying, “I just want to take a bunch of extra data that I can’t fit into my data warehouse, and I want to analyze that with SQL.” So that’s the traditional way to replace or offload the data warehouse; Drill is very well suited for that type of workload.
But it was also designed for the situation where you have new types of data that you need to interact with quickly, and come up with conclusions based on that data. And that could mean interacting with JSON without manipulating it, or working with a NoSQL solution like HBase. Drill allows you to query using those tools or using standard SQL, but with extensions so that you can interact with the data in a reasonable way.
Q. Are there different types of connectors for Apache Drill?
Jacques:Â Drill provides the standard JDBC and ODBC. It also provides lower level, schemaless APIs in both C++ and Java. It provides a CLI (command line interface) as well as a REST interface.
Q. Does Apache Drill store any info in its own DB?
Jacques: No. Drill is designed as a distributed query layer. One of the strengths of Hadoop is the loose coupling between systems; trying to be monolithic about approaching one of these problems generally causes problems for other tools that you want to use to interact with that data. The key thing that is changing in Hadoop is that you may have data stored in a certain way, and you want six different systems to interact with it. So having it stored in one particular system is not the ideal approach. Apache Drill is data agnostic; it’s more optimized for certain types of data formats, but its focus is to support all of the different queries that exist, as well as other native formats.
Q. Does Apache Drill also take one of the sources as in-memory?
Jacques:  Yes. Initially, we have a certain set of meta sources which do not include an in-memory source. That being said, we actually have shown in numerous cases that providing a new storage plug-in connector for a new type of data source is very straightforward. We are actually looking at supporting REST APIs as well as certain socket interfaces as direct interfaces for Drill. Since we’re in the open source community, if you’re interested in something, you can always come to the community and suggest a feature; we can look at it together and work on that.
Q. Does Apache Drill integrate with other tools such as Spring?
Jacques:  Right now, Drill is primarily focused on analytical workloads, not operational workloads – inserts, updates and deletes. It’s fully SQL-compliant for analytical workloads, but it doesn’t yet support operational workloads. That’s something that will be coming in the next 6-12 months. At that time, we will probably look at integrating Drill with Spring, ORMs, and those types of tools. Generally speaking, for analytical workloads, that’s not the most common interface.
Q. What about subquery support?
Jacques: Yes, Drill supports correlated subqueries with Exists, IN, etc. Drill includes standard syntax for all of those. One of the things that Drill did as an open source project was to pick up a very mature SQL parser that supports a very strong SQL spec. This is a foundational principle of Drill: it allows you to interact with it by using standard SQL. We won’t be deviating like HiveQL does; we are focused on standard SQL, and making it work the way it should work.
Q. Does Drill support distributed joins?
Jacques: Yes. Anything that’s a data table within Drill can be queried across. Drill actually exposes things in terms of databases and schemas, but Drill is different from most data systems in that Drill supports analysis and joins across any of those databases and schemas in the same query.
Q: With all of these new approaches, what does it mean in terms of the importance of Hive?
Matt: You talked about how Drill can enable you to do things that you can’t do with Hive. From our perspective, Hive is going to continue to be important; obviously there are a lot of people out there that are involved in Hadoop projects that are using Hive already and have applications that depend on Hive. But the new technologies and products enable them to take different approaches for newer applications. Hive will become one of many approaches to SQL and Hadoop rather than being the dominant approach.Â
Jacques: I would second that. I think Hive is critical to many of our customers, and it will continue to be an important player in the space. The reality is that each of these tools will find use cases where it’s really, really good, as well as use cases where it’s not. Five years ago, people asked the same questions about Cassandra and HBase – why would you have both Cassandra and HBase? What we’ve found, over the last five years in both of those cases, is that both tools became specialized in certain sub use cases. I think that will continue to happen; Hive has an extraordinarily rich history and a very large feature base, and it’s actually much broader that most of the other open source solutions. But it also carries with it a legacy of complicated interworkings that were built organically. The pieces are all going to fit together; the strength of the Hadoop system is the fact that there doesn’t have to be one answer, and only one answer. Depending on particular use cases, and depending on people’s comfort with a particular type of technology, all of these projects will continue to have a place. There will probably be more specialization over the next five years with things like Spark, SQL, etc. There will be a lot of specialization towards workloads that specifically involve things like machine learning and more complicated semi-interactive work flows. I call those work flows semi-interactive only because you have to write a bunch of code before you can run them, and that means it’s going to be a little less important to have those few-second response times. Hive is known for its extreme scale, in part because it relies heavily on MapReduce for its execution, but it also has some legacy around how it can serve a large numbers of users because of that architecture. So each of these tools will have a huge amount of specialization.
Q: How do you do transformation with Hadoop?
Matt:Â Clearly, you can write the job itself in terms of transformation, but more commonly, we see that people are using existing ETL tools. Most of those now support Hadoop as the underlying engine, as well as the source and the target for ETL jobs.
Q: How do you get started with Apache Drill?
Jacques:Â We provide a MapR Sandbox with Apache Drill, which is a vm that you can download, play with the tools, and begin to understand how they all work together. As part of that, we provide a large number of tutorials so that you can walk through a use case and better understand what types of different technologies will solve your particular problem.
Want to learn more? Check out these resources on MapR and Apache Drill:
Download the MapR Sandbox with Apache Drill:Â https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill
There are several projects for SQL-on-Hadoop. What makes Drill different? What are the top 10 reasons why Drill is a valuable and innovative technology in your tool belt for interactive data exploration on big data?Â
To read more, the whole blog post can be found here.Â
Announcing the Apache Drill Beta Release, Self Service Data Exploration in Action
It is our pleasure to announce the 0.5.0 release of Apache Drill. This is Drill’s first beta release and the second in our iterative monthly release cycle. It includes more than 100 issues addressed since last month’s release and more than 1,000 addressed since Drill’s inception, this is a great release to start exploring your data, wherever and whatever it is.
To read more on this release, please visit the original Apache blog post by committer Jacques Nadeau.
What questions or comments do you have about the design of Drill?Â
What are your thoughts or suggestions for the Drill community?
The Bay Area Apache Drill User group is going to meet in San Jose, California next Monday 24 February at 6pm, and where ever you may live, we want to hear from you.
Please tweet your ideas or comments using the hashtag #drilltalk by Monday evening Pacific time (you can follow Drill on Twitter as @ApacheDrill). Or add a comment or question here.
http://bit.ly/1gB2E6p
And to get you thinking about how you’d use Drill, I recently asked Michael Hausenblas (MapR Chief Data Engineer and Drill contributor) for his thoughts looking forward to what Drill will do:
“Apache Drill allows business analysts to query heterogeneous data sources at scale, in a time-efficient and familiar way.
* Heterogeneous data sources … no matter if the data resides in existing relational databases (such as Oracle DB, MySQL, etc.), in a NoSQL database such as MongoDB or is available as Apache Hadoop-native, that is, in HDFS, MapR-FS or HBase, Apache Drill queries the data in-situ, By querying the data where it sits, there is no ETL process required to move the data into a central location as is usual in a data warehouse setting.
At scale … Drill works well for small-sized datasets (a few gigabytes) but also scales out to the terabyte and petabyte range, depending only on the number of machines available in a cluster (hence dictating the degree of parallelism at which a query can be executed.
Time-efficient … this means two things in the context of Drill:
Because there is no ETL step involved, the data can be queried directly where it is located
Due to the style the query is executed (based on Google Dremel’s multi-level execution tree, in-memory, streaming operators ,etc.) with Drill the response times are typically in the low seconds. This rapid response time is possible even on large datasets, which means it is well suited for low-latency application scenarios. Imagine someone sitting in front of a BI tool clicking on a button, expecting an answer immediately rather than the minutes or hours generally expected from MapReduce-based systems.
Familiar way … on the one hand this means that standard query interfaces such as full SQL supported are guaranteed with Drill (no matter if the data resides in a strongly-typed datasource such as a RDBMS or exists as JSON files in, say, HDFS) but also that ad-hoc queries are possible.”
 With those thoughts about Drill in mind, what are your ideas about how you'd use it?
Tweet your comments/questions with hashtag #drilltalk and @ApacheDrill to join the discussion on Monday.Â
Congratulations New Apache Drill Committers & Mentor
by Ellen Friedman on Twitter as @Ellen_Friedman
As we welcome the new year, Apache Drill has two new committers: Timothy Chen and Julian Hyde. Their hard work on behalf of Drill has earned the notice and gratitude of the project and community.
Tim Chen is an engineer at Microsoft in Seattle who recently spoke at the Bay Area Apache Drill User Group meet-up about his work related to the lifetime of a Drill query end-to-end. Tim’s presentation was part of the celebration of the first milestone release for Drill. Please see the earlier post here at the Drill User blog for details. You can find out more at Tim’s blog or follow him on Twitter @tnachen
Julian Hyde was an engineer at Pentaho who recently moved to Hortonworks. For Drill, Julian has worked on the SQL. Julian is also lead developer of Mondrian OLAP engine and Optiq data platform and is one of the authors of the Manning book Mondrian in Action  http://www.manning.com/back/  Julian will be one of the speakers at the next Bay Area Apache Drill User Group planned for 24 Feb 2014. Stay tuned for details. You can follow Julian on Twitter @julianhyde
Drill is also fortunate to have the help of a new project mentor, Sebastian Schelter. Sebastian is a PhD student and research associate at TU Berlin, with expertise in machine learning, especially recommendation. Sebastian is active with the Apache Foundation, being a PMC member and committer for the Apache Mahout project. Sebastian is on Twitter as @sscdotopen
And a Happy New Year for 2014 to you all!
Follow the Apache Drill community on Twitter @ApacheDrill
Check out the Apache Drill project website at http://bit.ly/YDkYEl
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
âś“ Live Streamingâś“ Interactive Chatâś“ Private Showsâś“ HD Quality
Anya is LIVE right now
FREE
Free to watch • No registration required • HD streaming
Apache Drill Query in Action: Drill User Group Event
By Ellen Friedman, Twitter ID: @Ellen_Friedman
Co-organizer of Bay Area Apache Drill User Group
The Apache Drill project is building an innovative tool for ad hoc, interactive queries in the time scale of 100ms to 20 minutes on large, distributed data systems. Participants in the open source Apache Drill community recently came together to take a look at how Drill works now and what will be the next steps in the project.
(read more)
The event was the November 4th meet-up of the Bay Area Apache Drill Users, the first entirely Drill-based meet-up group. This meeting was hosted at Cisco in San Jose, with MapR Technologies as co-host. A large group collected on-site at Cisco’s conference facility and almost as many participants joined remotely via WebEx. The event marked the recent first official release of the Apache Drill project.
Speakers included two from the MapR Drill team from San Jose, Drill lead engineer Jacques Nadeau and developer Steven Phillips and Timothy Chen, a Drill contributor who lives in Seattle, where he is an engineer at Microsoft.Â
Tim Chen came down to San Jose earlier in the day before the meet-up to get together directly with other Drill developers and enjoy the unusual situation of being able to discuss the work in person. Â
Drill meet-up speakers Steven, Tim & Jacques
 Apache Drill: A Look Forward
Jacques Nadeau kicked off the evening meet-up with a road map toward maturity at version 1.0 for the Drill project. He pointed out that Milestone 1 was achieved in late September when the Apache Foundation approved the first official code release. Progress toward Milestone 2 is actively under way now.
Does a project need to reach version 1.0 to be usable? The answer varies with the project, but generally an Apache project releases usable but work-in-progress versions before full maturity. An example is the Apache Mahout project, which is currently at version 0.8 and yet has been used successfully in production settings for over a year. Drill isn’t at that stage now – functionality is built in stages – but it’s beginning to be ready for early users to try it out and give feedback.
Milestone 1: Initial functionality PASSED
JDBC, Distributed execution, Parquet and JSON readers
Milestone 2: Architectural validation IN PROGRESS
Performance, total sort, node buffering, diagnostic tools and instrumentation, Parquet writer
Apache Drill is an ambitious project designed to be more flexible, more wide ranging and extensible than many of the other tools being built to address similar issues. That’s a challenge, but one that is being met with some very promising initial work.
Lifetime of a Query in Apache Drill
The second speaker, Timothy Chen, presented the story of the lifetime of a Drill query, starting with SQL input and following the events to the distributed Drillbits on different nodes.
To follow what happens to a query, it’s helpful to understand that Drill, like Google’s Dremel project, relies on multi-level execution trees and leverages columnar-oriented storage. Whether schemaless or not, abstractly you can think of each data tree as a JSON object.  Each tree is composed of a key and a root node.  (ref to Dremel paper: http://research.google.com/pubs/pub36632.html)
Another important concept is the DrillBit: as Tim explained, a DrillBit is simply a worker process running on any particular node in the cluster. To tell the story of what happens to a user query as it is processed by Drill, Tim used an example system that included DrillBits on three nodes plus the coordinating services of ZooKeeper and Hazelcast.Â
Drill can accept full ANSI SQL: 2003 queries, which in turn are passed via Sqlline to Optiq, a library Drill uses for SQL parsing and planning according to a collection of Planning Rules. These rules come into play as the system builds a logical plan for the query. The logical plan describes the abstract dataflow of the query (which is language-agnostic).  The logical plan tries to work with primitive operators without focusing  on optimization at this stage. Â
Figure shows highly simplified view of the lifetime of a Drill query
The next step is for the logical plan to be passed to and through the Foreman. A Foreman in Drill is the DrillBit that initially handles the query, effectively forming the root node of the multi-level execution tree. Any DrillBit potentially could serve as Foreman, but once the process is in motion, the Foreman will direct processing to appropriate additional DrillBits on other nodes, to maximize locality. A number of things happen at this stage, as the Foreman turns the logical plan into a physical plan for execution.
This is of course a very simplified summary of the detailed sequence described during the talk.
Apache Drill Live Demo: Drill performs on distributed nodes
Steven Phillips closed the evening with an in-depth technical discussion of the current state of Drill milestone 1, followed by a live demo. His presentation included a particular focus on the physical operators now in place.
Steven explained that the Drill logical plan is designed to be as easy as possible for language implementers to use. With the design aimed at high degree of flexibility, Drill does not constrain queries to SQL specific paradigm – instead, it also supports complex data type operators such as collapse and expand.
In addition to his detailed discussion of the current features of the alpha release, Steven included a live demo of a query being processed by Drill. For simplicity in the presentation, Steven ran his query on a single machine, but one of the advances in the first milestone version is that distributed mode is possible. This ability for Drill to run on a distributed system is a large step in the project since this summer when participants tried Drill queries on single machines during a Drill workshop at OSCON.  At this stage of development, distributed mode is still somewhat cumbersome, requiring manual submission of a physical plan. To make this easier, Drill contributor Michael Hausenblas has put together a detailed description of how to do it: https://github.com/mhausenblas/apache-drill-sandbox/tree/master/M1
Code for the first milestone release of Drill can be found at the official project website. A link to the WebEx recording made available by Cisco for this meet-up is found below, along with link to Tim Chen’s blog on his talk .
Apache Drill Community
One of the strengths of an open source project developed under the umbrella of the Apache Foundation is that the community grows as the code is developed. The resulting project reflects a collective effort both from developers and early users, who can provide valuable feedback to guide further design. Apache Drill is fortunate to have a strong and growing community as it passes its first milestone release.
One of the challenges for an Apache project, however, is how to keep diverse members of the project connected, especially when they are separated by geography and often time zone. A live meet-up of the Drill community members in real-time helps to build communication and connections and gives the project a boost. Â Thanks to all who made this meet-up possible.
Apache Drill is an open source project that welcomes your participation. You can find out more on the project website, by joining the Bay Area meet-up, or by following the project on Twitter.
Apache Drill Resources
Follow on Twitter: @ApacheDrill
WebEx recording of Nov 4, 2013 meet-up presentation, runs 1 hour 41 min: https://cisco.webex.com/ciscosales/lsr.php?AT=pb&SP=MC&rID=72775662&rKey=031c783655239fd8
Tim Chen blog on “Lifetime of a Query in Apache Drill Alpha”: http://bit.ly/1erl77n
Bay Area Apache Drill User Group: Â http://bit.ly/17ArvnP
Apache Drill official project web site includes access to code for 1st milestone release: http://bit.ly/YDkYEl
In his recent blog post, Yash Sharma provides a detailed account of how to contribute to Apache Drill: Implementing Drill Math Functions—the article is geared towards Java developers but I'd argue that also Apache Drill users in general would benefit from studying it.
Huge congrats to the Apache Drill team! The alpha release is being shipped now and Drill has won its first award: it’s one of the best open source big data tools 2013.
BTW, last week I gave an Apache Drill talk and demo at the HUG Stockholm—slides and a video recording are available.
It’s been a very active season for the Drill community as the project prepares for a first milestone release. And with the Drill demo on the website and participation in a hands-on workshop, the “user” part of this Apache Drill User site is beginning to live up to its name.
A sample of events in June – August include:
Article by Michael Hausenblas @mhausenblas and Jacques Nadeau @intjesus “Introduction to Apache Drill: Interactive Ad-Hoc Query for Large-scale Datasets”. Big Data. June 2013, 1(2): 100-104. doi:10.1089/big.2013.0011. http://bit.ly/15101Y7
Drill talks by @mhausenblas at Hive London and in Paris in June
Apache Drill project website redesigned to have a new look: http://incubator.apache.org/drill/
Apache Drill hands-on workshop by Ted Dunning @ted_dunning and Jacques Nadeau @intjesus at OSCON in Portland, Oregon USA in July for ~40 participants. A blog post by Ellen Friedman @Ellen_Friedman reports on that Drill-via-Amazon-Cloud event and includes links to slides: http://bit.ly/18aS3Lk
Drill blog article by S. J. Vaughan-Nichols “Drilling into Big Data with Apache Drill” in Aug: http://bit.ly/1309MXA
Apache Drill project featured by panelist @tshiran in Aug for the “Hadoop + SQL” Hive Data Think Tank event in California Bay Area. A blog posting as a prelude to the event can be found here: http://bit.ly/1cvxn5D
New developers and non-code contributors are participating in the community
Discussion is getting started on the Apache Drill user mailing list: http://bit.ly/19modUt
Twitter group for @ApacheDrill grew significantly to 437 followers.
And September is starting with more activity, including an upcoming meetings of the Bay Area Apache Drill User group featuring MapR engineer Steve Phillips in September and a still-to-be-scheduled talk by Tim Chen (Microsoft) most likely in late Oct or November. Stay tuned!
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
âś“ Live Streamingâś“ Interactive Chatâś“ Private Showsâś“ HD Quality
Anya is LIVE right now
FREE
Free to watch • No registration required • HD streaming
NoSQL matters 2013 in Cologne, Germany—lots of good discussions and great people around for the Apache Drill training day, thank you everybody involved and hope to 'see' you on the mailing list, on Twitter or F2F next time!
Apache Drill User @drill-user - Tumblr Blog | Tumlook