Kafka Summit 2017-sf (pipeline)

Billions of Messages a Day – Yelp’s Real-time Data Pipeline

by Justin Cunningham, Technical Lead, Software Engineering, Yelp
Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp’s Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications – making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. We’ll show how a few simple services at Yelp lay the foundation that powers everything from search to our experimentation framework.

Body Armor for Distributed System

by Michael Egorov, Co-founder and CTO, NuCypher
We show a way to make Kafka end-to-end encrypted. It means that data is ever decrypted only at the side of producers and consumers of the data. The data is never decrypted broker-side. Importantly, all Kafka clients have their own encryption keys. There is no pre-shared encryption key. Our approach can be compared to TLS implemented for more than two parties connected together.


DNS for Data: The Need for a Stream Registry

by Praveen Hirsave, Director Cloud Engineering, HomeAway
As organizations increasingly adopt streaming platforms such as kafka, the need for visibility and discovery has become paramount. Increasingly, with the advent of self-service streaming and analytics, a need to increase on overall speed, not only on time-to-signal, but also on reducing times to production is becoming the difference between winners and losers. Beyond Kafka being at the core of successful streaming platforms, there is a need for a stream registry. Come to this session to find out how HomeAway is solving this with a “just right” approach to governance.


Efficient Schemas in Motion with Kafka and Schema Registry

by Pat Patterson, Community Champion, StreamSets Inc.
Apache Avro allows data to be self-describing, but carries an overhead when used with message queues such as Apache Kafka. Confluent’s open source Schema Registry integrates with Kafka to allow Avro schemas to be passed ‘by reference’, minimizing overhead, and can be used with any application that uses Avro. Learn about Schema Registry, using it with Kafka, and leveraging it in your application.

From Scaling Nightmare to Stream Dream : Real-time Stream Processing at Scale

by Amy Boyle, Software Engineer, New Relic
On the events pipeline team at New Relic, Kafka is the thread that stitches our micro-service architecture together. We receive billions of monitoring events an hour, which customers rely on us to alert on in real-time. Facing a ten fold+ growth in the system, learn how we avoided a costly scaling nightmare by switching to a streaming system, based on Kafka. We follow a DevOps philosophy at New Relic. Thus, I have a personal stake in how well our systems perform. If evaluation deadlines are missed, I loose sleep and customers loose trust. Without necessarily setting out to from the start, we’ve gone all in, using Kafka as the backbone of an event-driven pipeline, as a datastore, and for streaming updates to the system. Hear about what worked for us, what challenges we faced, and how we continue to scale our applications.

How Blizzard Used Kafka to Save Our Pipeline (and Azeroth)

by Jeff Field, Systems Engineer, Blizzard
When Blizzard started sending gameplay data to Hadoop in 2013, we went through several iterations before settling on Flumes in many data centers around the world reading from RabbitMQ and writing to central flumes in our Los Angeles datacenter. While this worked at first, by 2015 we were hitting problems scaling to the number of events required. This is how we used Kafka to save our pipeline.


Kafka Connect Best Practices – Advice from the Field

by Randall Hauch, Engineer, Confluent
This talk will review the Kafka Connect Framework and discuss building data pipelines using the library of available Connectors. We’ll deploy several data integration pipelines and demonstrate :

best practices for configuring, managing, and tuning the connectors
tools to monitor data flow through the pipeline
using Kafka Streams applications to transform or enhance the data in flight.



One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers

by Gwen Shapira, Product Manager, Confluent
You have made the transition from single machines and one-off solutions to distributed infrastructure in your data center powered by Apache Kafka. But what if one data center is not enough? In this session, we review resilient data pipelines with Apache Kafka that span multiple data centers. We provide an overview of best practices and common patterns including key areas such as architecture and data replication as well as disaster scenarios and failure handling.

