# Kafka topic and schema registry for Sentry Contains the Kafka topics and schema definitions used by the Sentry service. ## Defining schemas Currently only jsonschema is supported. The jsonschema should be placed directly in the `schemas` directory, and then referenced from the relevant topic via `resource` property. We use jsonschema for both JSON- and msgpack-based topics, as most msgpack types have a JSON-equivalent. For bytestrings, we type them using `{"description": "msgpack bytes"}`, which is currently just interpreted like `{}` (allow all types). If you don't want to hand-write it, for generating an initial json schema from a payload we like https://github.com/quicktype/quicktype ## How strict should my schema be? If in doubt, we recommend that schemas are only as strict as is minimally required by all consumers and downstream code required by Sentry. However it is ultimately up to the owners of the schema to decide whether a stricter schema is appropriate in particular scenarios. ## Adding example messages Example messages can be placed in the `examples` directory and referenced from the relevant topic/version. Example messages must be stripped of **all** customer related data. This also includes things like organization and project IDs, which should be replaced with something like `project_id: 1` or `org_id: 1`. ## Defining topics Each topic is a yaml file in the topics directory. This topic name is a "logical" topic name as many services in Sentry support overriding the default name to a different physical topic name if desired. Topic names must be unique in Sentry: the same name cannot be used for different types of data. The yaml file of a topic has the following keys: 1. `schemas`. Schemas is an array. The following should be provided for each schema: - `version`: Incrementing integer. Should start at 1. - `compatibility_mode`: `none` or `backward`. - `type`: Can be either `json` or `msgpack`. In both cases we use jsonschema to define the message schema. - `resource`: Should match the file name in the `schemas` directory - `examples`: Should match the file names in the `examples` directory 2. `topic_configuration_config`. Configuration used to create the topic 3. `services`. Which Sentry services produce to and consume from the topic. 4. `description`. 5. `pipeline`. ## Using the schema (in Python) ```python from sentry_kafka_schemas import get_codec, ValidationError from sentry_kafka_schemas.schema_types.ingest_metrics_v1 import IngestMetric SCHEMA: Codec[IngestMetric] = get_codec("ingest-metrics") try: decoded = SCHEMA.decode(b'{"type": "c", ...}') except ValidationError: return # ingest-metrics schema defines retention_days as required type, so this is # safe. retention_days = decoded["retention_days"] ``` ### Using Python types Python types are automatically generated under `sentry_kafka_schemas.schema_types`. A schema for version 1 of the topic `foo-bar` is exported under `sentry_kafka_schemas.schema_types.foo_bar_v1`. Use `title` attribute on your JSON schema and the various definitions to assign them a stable name. For example: ```javascript // a schema referenced from `topics/events.yaml, containing topic: events { "title": "main_schema", "description": "Some additional information about the schema." "properties": { "subfield": {"$ref": "#/definitions/SubSchema"} }, "definitions": { "SubSchema": { "type": "object", "title": "sub_schema" } } } ``` Produces: ```python # file: sentry_kafka_schemas/schema_types/events_v1.py class MainSchema(TypedDict, total=False): """Some additional information about the schema.""" subfield: SubSchema class SubSchema(TypedDict, total=False): ... ``` `title` can be added at any level, not just within `definitions`, to produce types. Use that power tastefully! ## Using Rust types We use a completely different library for generating Rust types, and therefore the rules by which Rust type names are generated are different. **Rust types are work-in-progress.** For now, schema files need to be explicitly added to `rust/build.rs`. The generated types can be viewed with `make view-rust-types`, `cargo doc --open`, or online on https://docs.rs/sentry-kafka-schemas. ## Release process and development install For releasing a new stable version from main branch, go to [Actions](https://github.com/getsentry/sentry-kafka-schemas/actions) and trigger a new job for the `Release` workflow. We usually just increment the `patch` number for schema changes. e.g. If the last version was 0.1.11, the next version should be 0.1.12. Check https://github.com/getsentry/sentry-kafka-schemas/releases for the latest release numbers. After releasing a new version, you should immediately bump Sentry, Snuba and Relay to ensure that all services are synchronized onto the new schema as soon as possible. Most likely you are working on a PR to Snuba or Sentry where you already want to use those types. You can do that by running `make build` in this repo, then running `pip install -e ~/projects/sentry-kafka-schemas/`. You need to re-run `make build` to update types -- they do not automatically change with schema changes even if you install this package in development mode. To stop using a development version of this repo in whichever service you're working on, you can reinstall Python dependencies in that repo. Most likely the command is `make install-py-dev`. ## Schema ownership All topics definitions, schemas and examples should have a defined owner or multiple owners if shared. The CODEOWNERS file should be updated with this information whenever new schemas and topics are added. Review is only required from one team/owner, not from all of them.