Data Engineering

Trainings

Apache Spark is a data analysis and aggregation tool built atop Scala. It is also a distributed calculation tool across multiple worker machines in a cluster. What makes the relationship of Spark and Scala so special is the ability to perform data analysis with functional programming or SQL.

This course is tailored for data analysts and engineers looking to harness their data workloads and develop solutions.

Talks

Machine Learning Data Pipelines

How do we move information realtime and connect machine learning models to make decisions on our business data? This presentation goes through machine learning and Kafka tools that would help achieve that goal.

Kafka and Streaming

Kafka has captured mindshare in the data records streaming market, and in this presentation, we knock on its door and see what lies behind it. What is the draw? What makes it an attractive addition? How does it compare to Message Queues and other message streaming services?

Machine Learning with Spark MLLib

Spark has a machine learning aspect to it and it’s called Spark MLLib. We discuss an intro into machine learning, some models, then apply some of those common machine learning models.