![Young business people looking at computer on office desk at night](https://wordpress-1016567-4521551.cloudwaysapps.com/wp-content/uploads/2023/11/level-up-your-kafka-applications-with-schemas-ibm-blog.webp)
Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. In this article, developer Michael Burgess provides an insight into the concept of schemas and schema management as a way to add value to your event-driven applications on the fully managed Kafka service, IBM Event Streams on IBM Cloud®.
What is a schema?
A schema describes the structure of data.
For example:
A simple Java class modelling an order of some product from an online store might start with fields like:
public class Order{
private String productName
private String productCode
private int quantity
[…]
}
If order objects were being created using this class, and sent to a topic in Kafka, we could describe the structure of those records using a schema such as this Avro schema:
{
"type": "record",
"name": “Order”,
"fields": [
{"name": "productName", "type": "string"},
{"name": "productCode", "type": "string"},
{"name": "quantity", "type": "int"}
]
}
Why should you use a schema?
Apache Kafka transfers data without validating the information in the messages. It does not have any visibility of what kind of data are being sent and received, or what data types it might contain. Kafka does not examine the metadata of your messages.
One of the functions of Kafka is to decouple consuming and producing applications, so that they communicate via a Kafka topic rather than directly. This allows them to each work at their own speed, but they still need to agree upon the same data structure; otherwise, the consuming applications have no way to deserialize the data they receive back into something with meaning. The applications all need to share the same assumptions about the structure of the data.
In the scope of Kafka, a schema describes the structure of the data in a message. It defines the fields that need to be present in each message and the types of each field.
This means a schema forms a well-defined contract between a producing application and a consuming application, allowing consuming applications to parse and interpret the data in the messages they receive correctly.
What is a schema registry?
A schema registry supports your Kafka cluster by providing a repository for managing and validating schemas within that cluster. It acts as a database for storing your schemas and provides an interface for managing the schema lifecycle and retrieving schemas. A schema registry also validates evolution of schemas.
Optimize your Kafka environment by using a schema registry.
A schema registry is essentially an agreement of the structure of your data within your Kafka environment. By having a consistent store of the data formats in your applications, you avoid common mistakes that can occur when building applications such as poor data quality, and inconsistencies between your producing and consuming applications that may eventually lead to data corruption. Having a well-managed schema registry is not just a technical necessity but also contributes to the strategic goals of treating data as a valuable product and helps tremendously on your data-as-a-product journey.
Using a schema registry increases the quality of your data and ensures data remain consistent, by enforcing rules for schema evolution. So as well as ensuring data consistency between produced and consumed messages, a schema registry ensures that your messages will remain compatible as schema versions change over time. Over the lifetime of a business, it is very likely that the format of the messages exchanged by the applications supporting the business will need to change. For example, the Order class in the example schema we used earlier might gain a new status field—the product code field might be replaced by a combination of department number and product number, or changes the like. The result is that the schema of the objects in our business domain is continually evolving, and so you need to be able to ensure agreement on the schema of messages in any particular topic at any given time.
There are various patterns for schema evolution:
- Forward Compatibility: where the producing applications can be updated to a new version of the schema, and all consuming applications will be able to continue to consume messages while waiting to be migrated to the new version.
- Backward Compatibility: where consuming applications can be migrated to a new version of the schema first, and are able to continue to consume messages produced in the old format while producing applications are migrated.
- Full Compatibility: when schemas are both forward and backward compatible.
A schema registry is able to enforce rules for schema evolution, allowing you to guarantee either forward, backward or full compatibility of new schema versions, preventing incompatible schema versions being introduced.
By providing a repository of versions of schemas used within a Kafka cluster, past and present, a schema registry simplifies adherence to data governance and data quality policies, since it provides a convenient way to track and audit changes to your topic data formats.
What’s next?
In summary, a schema registry plays a crucial role in managing schema evolution, versioning and the consistency of data in distributed systems, ultimately supporting interoperability between different components. Event Streams on IBM Cloud provides a Schema Registry as part of its Enterprise plan. Ensure your environment is optimized by utilizing this feature on the fully managed Kafka offering on IBM Cloud to build intelligent and responsive applications that react to events in real time.
- Provision an instance of Event Streams on IBM Cloud here.
- Learn how to use the Event Streams Schema Registry here.
- Learn more about Kafka and its use cases here.
- For any challenges in set up, see our Getting Started Guide and FAQs.
More from Cloud
![Two people working in an office](https://wordpress-1016567-4521551.cloudwaysapps.com/wp-content/uploads/2023/11/level-up-your-kafka-applications-with-schemas-ibm-blog-2.webp)
November 20, 2023
SSD vs. NVMe: What’s the difference?
7 min read – Recent technological advancements in data storage have prompted businesses and consumers to move away from traditional hard disk drives (HDDs) towards faster, lower-latency solid-state drive (SSD) technology. In this post, we’re going to look at this new technology, as well as the fastest and most popular protocol available to connect it to a computer’s motherboard—non-volatile memory express (NVMe). While the terms SSD and NVMe are often used to describe two different types of drives, they are actually different data storage…
![Portrait young developer programmer, software engineer, IT support, wearing glasses look at camera and smile enjoy working at home.](https://wordpress-1016567-4521551.cloudwaysapps.com/wp-content/uploads/2023/11/level-up-your-kafka-applications-with-schemas-ibm-blog-3.webp)
November 20, 2023
Business leaders highlight the need for a hybrid cloud approach to unlock the power of generative AI
3 min read – In 2023, organizations have faced an unprecedented level of pressure to digitally transform with the rise of generative AI as well as imperatives such as sustainability, labor productivity and security. The “Cloud Transformation Report,” a new global survey from the IBM Institute for Business Value (IBV), found that many leading enterprises share a common foundation to digital transformation—a clear hybrid cloud strategy.¹ These businesses cite several key benefits to using a hybrid cloud approach to fuel business transformation, including modernization,…
![](https://wordpress-1016567-4521551.cloudwaysapps.com/wp-content/uploads/2023/11/level-up-your-kafka-applications-with-schemas-ibm-blog-4.webp)
November 14, 2023
An introduction to Wazi as a Service
4 min read – In today’s hyper-competitive digital landscape, the rapid development of new digital services is essential for staying ahead of the curve. However, many organizations face significant challenges when it comes to integrating their core systems, including Mainframe applications, with modern technologies. This integration is crucial for modernizing core enterprise applications on hybrid cloud platforms. Shockingly, a staggering 33% of developers lack the necessary skills or resources, hindering their productivity in delivering products and services. Moreover, 36% of developers struggle with the…
![Person on computer](https://wordpress-1016567-4521551.cloudwaysapps.com/wp-content/uploads/2023/11/level-up-your-kafka-applications-with-schemas-ibm-blog-5.webp)
November 13, 2023
Top 6 Kubernetes use cases
5 min read – Kubernetes, the world’s most popular open-source container orchestration platform, is considered a major milestone in the history of cloud-native technologies. Developed internally at Google and released to the public in 2014, Kubernetes has enabled organizations to move away from traditional IT infrastructure and toward the automation of operational tasks tied to the deployment, scaling and managing of containerized applications (or microservices). While Kubernetes has become the de facto standard for container management, many companies also use the technology for a…
IBM Newsletters
Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now
More newsletters
- SEO Powered Content & PR Distribution. Get Amplified Today.
- PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
- PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
- PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
- PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
- Source: https://www.ibm.com/blog/level-up-your-kafka-applications-with-schemas/
- :has
- :is
- :not
- :where
- $UP
- 1
- 13
- 14
- 20
- 2014
- 2023
- 22
- 29
- 30
- 300
- 31
- 39
- 40
- 400
- 41
- 9
- 97
- a
- Able
- About
- acts
- actually
- add
- adherence
- advancements
- Advertising
- Agreement
- ahead
- AI
- All
- Allowing
- allows
- also
- amp
- an
- analytics
- and
- any
- Application
- applications
- approach
- ARE
- article
- AS
- assumptions
- At
- audit
- author
- Automation
- available
- avoid
- away
- back
- BE
- become
- being
- benefits
- between
- Blog
- Blue
- both
- build
- Building
- business
- Business Leaders
- Business Transformation
- businesses
- but
- button
- by
- camera
- CAN
- carbon
- card
- Cards
- cases
- CAT
- Category
- challenges
- change
- Changes
- check
- circles
- class
- clear
- Cloud
- Cluster
- code
- color
- combination
- comes
- Common
- communicate
- Companies
- compatibility
- compatible
- components
- computer
- concept
- Connect
- considered
- consistent
- consume
- consumed
- Consumers
- contain
- Container
- continually
- continue
- contract
- contributes
- Convenient
- Core
- correctly
- Corruption
- could
- created
- crucial
- CSS
- curve
- custom
- data
- data quality
- data storage
- Database
- Date
- Default
- Defines
- definitions
- deliver
- delivering
- Department
- deployment
- describe
- description
- desk
- developed
- Developer
- developers
- Development
- difference
- different
- digital
- digital services
- digitally
- directly
- distributed
- distributed systems
- does
- domain
- drive
- drives
- each
- Earlier
- either
- emerging
- enabled
- enforce
- enforcing
- engineer
- enjoy
- ensure
- ensures
- ensuring
- Enter
- Enterprise
- enterprises
- Environment
- essential
- essentially
- Ether (ETH)
- Event
- events
- eventually
- evolution
- evolving
- examine
- example
- exchanged
- Exit
- express
- Face
- faced
- false
- faster
- fastest
- Feature
- field
- Fields
- First
- follow
- fonts
- For
- format
- forms
- Forward
- found
- Foundation
- from
- Fuel
- full
- fully
- functions
- Gain
- generative
- Generative AI
- generator
- get
- given
- glasses
- Global
- Goals
- going
- governance
- Grid
- grown
- guarantee
- Hard
- Have
- having
- Heading
- height
- helps
- Highlight
- history
- Home
- How
- How To
- However
- HTTPS
- Hybrid
- hybrid cloud
- IBM
- IBM Cloud
- ICO
- ICON
- image
- in
- Including
- incompatible
- inconsistencies
- Increases
- index
- information
- Infrastructure
- insight
- insights
- instance
- Institute
- Integrating
- integration
- Intelligent
- Interface
- internally
- Interoperability
- into
- introduced
- Introduction
- IT
- IT Support
- ITS
- Java
- journey
- jpg
- just
- kafka
- Key
- Kind
- Kubernetes
- labor
- Lack
- landscape
- large
- latest
- lead
- leaders
- Leadership
- leading
- Level
- lifecycle
- lifetime
- like
- likely
- local
- locale
- Look
- looking
- major
- managed
- management
- managing
- many
- max-width
- May..
- meaning
- means
- Memory
- message
- messages
- Metadata
- Michael
- microservices
- might
- migrated
- milestone
- min
- minutes
- mistakes
- Mobile
- modelling
- Modern
- modern technologies
- modernization
- modernizing
- more
- Moreover
- most
- Most Popular
- move
- name
- Navigation
- necessary
- necessity
- Need
- New
- Newsletters
- next
- night
- no
- nothing
- November
- now
- number
- objects
- occur
- of
- off
- offering
- Office
- often
- Old
- on
- online
- online store
- open source
- operational
- optimized
- or
- orchestration
- order
- organizations
- otherwise
- our
- over
- own
- page
- part
- particular
- past
- patterns
- People
- person
- PHP
- plan
- platform
- Platforms
- plato
- Plato Data Intelligence
- PlatoData
- plays
- plugin
- policies
- policy
- poor
- Popular
- portrait
- position
- Post
- power
- present
- pressure
- preventing
- primary
- private
- processing
- Produced
- producing
- Product
- productivity
- Products
- Products and Services
- Programmer
- protocol
- provides
- providing
- public
- quality
- quantity
- rapid
- rather
- React
- Reading
- real
- real-time
- receive
- received
- record
- records
- registry
- released
- remain
- replaced
- report
- repository
- Resources
- responsive
- result
- Rise
- robots
- Role
- rules
- s
- same
- scaling
- scope
- Screen
- scripts
- security
- see
- sent
- seo
- service
- Services
- set
- several
- Share
- should
- significant
- Simple
- simplifies
- since
- site
- skills
- So
- Software
- Software Engineer
- some
- something
- speed
- Sponsored
- squares
- staggering
- standard
- start
- started
- Status
- staying
- Still
- storage
- store
- Strategic
- Strategy
- stream
- streaming
- streams
- String
- structure
- Struggle
- subscribe
- such
- SUMMARY
- support
- Supporting
- Supports
- Survey
- Sustainability
- SVG
- Systems
- tasks
- Technical
- technological
- Technologies
- Technology
- terms
- tertiary
- than
- that
- The
- the information
- their
- Them
- theme
- These
- they
- this
- those
- thought
- thought leadership
- Tied
- time
- Title
- to
- today
- top
- topic
- toward
- towards
- track
- traditional
- transfers
- Transform
- Transformation
- treating
- tremendously
- Trends
- two
- type
- types
- Ultimately
- unlock
- unprecedented
- updated
- Updates
- upon
- URL
- use
- used
- using
- Utilizing
- validates
- validating
- Valuable
- value
- various
- version
- very
- via
- visibility
- vs
- W
- Waiting
- Way..
- we
- WELL
- well-defined
- well-known
- were
- What
- when
- while
- will
- with
- within
- without
- WordPress
- Work
- working
- world’s
- written
- you
- young
- Your
- zephyrnet