Apache Spark flatMap transformation
Apache Spark
In our previous post, we talked about the Map transformation in Spark. In this post we will learn the flatMap transformation.
As per Apache Spark documentation, flatMap(func) is similar to map, but each input item can be mapped to 0 or more output items That means the func should return a scala.collection.Seq rather than a single item.
LetâsâŚ
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Made a top view of the proposed scene, with a half buried hedron in the middle nad a broken on the side. squares represents cabins and circle represent trees. Circle with a cross represent more lifeless tree, purhaps burned and snow covered? The sky should also contain some hedrons floating in the distance. Hedrons should be futuristic but with exposed components representing damage perhaps? Or can go for a more steampunk vibe with pistons outside? Cabins need to be primitive and maybe even collapsed cabins. the whole scene need to be covered in snow. Have not yet decided if human figures will be included. Could add some smoke indicating camp fire.
This Article is about FlatMap In Angular. Letâs say we wanted to implement an AJAX search feature in which every keypress in a text field will automatically perform a search and update the page with the results. How would this look? Well we would have an Observablesubscribed to events coming from an input field, and on every change of input we want to perform some HTTPâŚ
If you read our blog, you could notice that Node.js community is growing extremely fast. In this post, we are happy to announce wonderful news â Node 12.6.0 and 12.5.0 are released! There are many changes in the new versions of Node. Here we gathered the most important ones: build: su...
Seguimos con la serie Aprende apache spark con Java Apache Spark 2 : Flat map con Java #java #apachespark #transformations #flatmap #devs4j
A diferencia de Map, FlatMap nos permitirĂĄ generar una lista de elementos de uno solo, en Map tenĂamos un elemento de entrada y tenĂamos uno de salida, con FlatMap tendremos un elemento de entrada y a partir de este generaremos un conjunto de elementos de salida.
En el post Apache Spark 2 : Conceptos bĂĄsicos utilizamos Flat map para realizar un conteo de palabras, en este post haremos unaâŚ
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
In this blog, we would be looking at how map() and flatMap() operations work with Option and Future of scala, literally speaking both Futures and Options are very effective features of scala, A Future lets us have a value from some task on a differnt thread and Option provides us a hand from null of java as using null in scala is seen a very bad approach in functional programming.
Weâve been using Rx for a while now and across a variety of projects. Yet we continue to learn new things - in this case, a clear understanding of FlatMap.
My team and I recently discovered a bug in one of our projects, and the culprit turned out to be the FlatMap operatorâor rather, our misuse of it. I donât know why, but FlatMap was a recurring source of confusion. (And the official Rx docs didnât really shed much light on the subject.) Fortunately for us, we eventually gained clarity around the FlatMap operator using a real-life analogy that I thought would be worth sharing:
FlatMap is like the combining or flattening of commits pushed by a growing team of developers on a project.
Think of a team of developers on a project that uses a Continuous Integration (CI) service to build each pushed commit. The CI is interested in all of the commits pushed by all of the developers, including newly added ones, on the project.
FlatMap Analogy. Animation by Deyu Wang
We can dig into this more with codeâin this case, Swift (and RxSwift). First, letâs define the objects weâll need.
class Project { private let developerSubject = PublishSubject<developer>() var developerStream: Observable<developer> { return developerSubject.asObservable() } func addDeveloper(_ developer: Developer) { developerSubject.onNext(developer) } func stop() { developerSubject.onCompleted() } } class Developer { private let commitSubject = PublishSubject<commit>() let name: String init(_ name: String) { self.name = name } func startCoding() -> Observable<commit> { return commitSubject.asObservable() } func stopCoding() { commitSubject.onCompleted() } // Helper to externally simulate coding activity. func pushCommit(_ hash: String) { commitSubject.onNext(Commit(author: name, hash: hash)) } } struct Commit { let author: String let hash: String } class CI: ObserverType { func on(_ event: Event<commit>) { switch event { case .next(let commit): print("CI is building \(commit).") case .completed: print("CI stopped.") case .error(let error): print("CI errored: \(error).") } } }
Type Declarations
Now letâs instantiate things and hook it all up.
let project = Project() let jim = Developer("Jim") let anna = Developer("Anna") let bob = Developer("Bob") let ci = CI() project.developerStream .flatMap { developer -> Observable<commit> in print("\(developer.name) started coding...") return developer.startCoding() } .subscribe(ci)
Setup Objects
We flatMap the developerStream onto each developerâs commits. The resulting stream is subscribed to by the CI.
Looking closer, in the flatMap closure, for each developer emitted on developerStream, we return the developerâs stream of commits via the startCoding() function. This allows the flatMap to observe the commits emitted by all developers and then flatten them into a single output stream. The CI subscribes to this output stream so it can build each commit.
Letâs play with this to see what happens when we start adding developers and pushing commits. The trailing comments are the output from the print()s.
project.addDeveloper(jim) // Jim started coding... jim.pushCommit("1") // CI is building Commit(author: "Jim", hash: "1"). project.addDeveloper(anna) // Anna started coding... jim.pushCommit("2") // CI is building Commit(author: "Jim", hash: "2"). anna.pushCommit("3") // CI is building Commit(author: "Anna", hash: "3"). jim.pushCommit("4") // CI is building Commit(author: "Jim", hash: "4").
Normal Operation
Notice how even after Anna is added to the project, Jim remains âactiveâ and the CI continues to see Jimâs additional commits. Jimâs commit stream returned from startCoding() doesnât get replaced by Annaâs. Further, the order in which each developer is added doesnât matter. The CI only sees the commits in the sequence they are pushed. The CI doesnât care about the developers themselves, it only cares about their commits.
Taking a closer look, our flatMap subscribes to the commits from each developer received from developerStream. Even when new developers are received, those subscriptions continue to live on. This enables the commits from both Jim and Anna to be combined and flattened into a single output stream.
Wait â how is this different from **Merge?**
Instead of FlatMap, we could of course use the Merge operator to combine commits from multiple developers into a single stream. If the projectâs developers are known and fixed, this would be fine, as Merge takes a static list of observables.
However, if Bob came along to join the project at a later point, weâd have to handle that somehow. With FlatMap, since we subscribe to new developers being added, itâs handled for us already. Bobâs commits would automatically be taken into account and added to the flatMapâs output stream. Thus, we could say that Merge is for a static list of observables, while FlatMap is for a dynamic list of observables.
So, thatâs most of it. But letâs get a more complete (Rx joke) understanding of how FlatMap works by exploring complete and error events.
Completions
Itâs easy to start things, hard to complete them.
This saying is very true for FlatMap. In a nutshell, the CI will receive the completed event once all âactiveâ observables in the chain have completed.
Letâs look at an example where, after some normal operation, only the project stops. Will the CI stop?
project.addDeveloper(jim) // Jim started coding... project.stop() project.addDeveloper(bob) bob.pushCommit("1") jim.pushCommit("2") // CI is building Commit(author: "Jim", hash: "2").
Completing the Project (Parent Observable)
No, the CI doesnât stop. After the project stops, adding Bob has no effect because developerStream has already completed so heâs never âactivatedâ in the flatMap. Thus, when Bob pushes a commit, the CI doesnât see it. However, existing project members like Jim are free to continue pounding away and pushing commits which do get built by the CI.
Now letâs examine when, instead of the project stopping, every developer on the project stops. Will the CI stop?
project.addDeveloper(jim) // Jim started coding... project.addDeveloper(anna) // Anna started coding... jim.stopCoding() anna.stopCoding() jim.pushCommit("1") project.addDeveloper(bob) // Bob started coding... bob.pushCommit("2") // CI is building Commit(author: "Bob", hash: "2").
Completing the Developers (Child Observables)
No, the CI doesnât stop. As shown, when all the projectâs developers stop, it affects neither the CI nor the project. Bob is still able to join the project afterwards and push commits which the CI builds.
The only way to stop the CIâs subscription is to stop the project and all existing developers:
project.addDeveloper(jim) // Jim started coding... project.addDeveloper(anna) // Anna started coding... project.stop() jim.stopCoding() anna.stopCoding() // CI stopped. project.addDeveloper(bob) bob.pushCommit("1") jim.pushCommit("2")
Completing the CIâs Subscription
Itâs worth noting here that if no developers are ever added to the project, calling project.stop() would also stop the CI.
Errors
error events work as expected. When either the project or any added developer errors, the CI will receive the error event and the whole chain stops working.
Conclusion
Now that we know how FlatMap actually behaves, weâre able to use it with more confidence and in the right places.
The easiest way to play with this code (or any RxSwift code) on your own is to use the RxSwift repoâs playground. To get a closer look at things, including when the isDisposed event is being emitted, you may want to add this code to the playground along with some debug operators.
Acknowledgements
Iâd like to thank Eli Burnstein for helping shape and edit this article, Deyu Wang for creating the visual animation, and everyone on my project team for helping me better understand FlatMap.