RDF Canonicalization

Deterministic serialization and semantic equality testing.
Sign, cache, and compare RDF graphs reliably.

The Problem

RDF graphs with blank nodes can be serialized in countless different ways while representing the exact same information.

Document 1

_:alice <foaf:name> "Alice" .
_:alice <foaf:knows> _:bob .
_:bob <foaf:name> "Bob" .

Document 2

_:person1 <foaf:name> "Alice" .
_:person1 <foaf:knows> _:person2 .
_:person2 <foaf:name> "Bob" .

Same information, different blank node labels. String comparison fails. Object equality fails. How do you:

  • Digitally sign RDF data?
  • Use RDF graphs as cache keys?
  • Detect if two graphs are semantically identical?
  • Synchronize RDF data reliably?

The Solution

RDF Canonicalization transforms any RDF graph into a single, deterministic representation.

Both Documents Produce Identical Canonical Form

_:c14n0 <foaf:name> "Alice" .
_:c14n0 <foaf:knows> _:c14n1 .
_:c14n1 <foaf:name> "Bob" .

βœ“ Deterministic blank node labels
βœ“ Consistent ordering
βœ“ Reliable for signatures and comparison

Getting Started

Install the package

dart pub add locorda_rdf_canonicalization
dart pub add locorda_rdf_core  # For creating RDF graphs
import 'package:locorda_rdf_canonicalization/canonicalization.dart';
import 'package:locorda_rdf_core/core.dart';

void main() {
  // Two Turtle/N-Triples documents with identical semantic content
  // but different blank node labels
  final turtle1 = '''
    _:alice <http://xmlns.com/foaf/0.1/name> "Alice" .
    _:alice <http://xmlns.com/foaf/0.1/knows> _:bob .
    _:bob <http://xmlns.com/foaf/0.1/name> "Bob" .
  ''';

  final turtle2 = '''
    _:person1 <http://xmlns.com/foaf/0.1/name> "Alice" .
    _:person1 <http://xmlns.com/foaf/0.1/knows> _:person2 .
    _:person2 <http://xmlns.com/foaf/0.1/name> "Bob" .
  ''';

  // Parse both documents
  final graph1 = turtle.decode(turtle1);
  final graph2 = turtle.decode(turtle2);

  // They are different as strings and objects
  print('Strings identical: ${turtle1 == turtle2}'); // false
  print('Objects equal: ${graph1 == graph2}'); // false

  // But they are semantically equivalent (isomorphic)
  print('Isomorphic: ${isIsomorphicGraphs(graph1, graph2)}'); // true

  // Canonicalization produces identical output
  final canonical1 = canonicalizeGraph(graph1);
  final canonical2 = canonicalizeGraph(graph2);
  print('Canonical identical: ${canonical1 == canonical2}'); // true
}
import 'package:locorda_rdf_canonicalization/canonicalization.dart';
import 'package:locorda_rdf_core/core.dart';

void main() {
  // Actually, RDF Canonicalization is defined for RDF Datasets, so
  // this is how it looks with N-Quads and thus Datasets
  final nquads1 = '''
    _:alice <http://xmlns.com/foaf/0.1/name> "Alice" .
    _:alice <http://xmlns.com/foaf/0.1/knows> _:bob .
    _:bob <http://xmlns.com/foaf/0.1/name> "Bob" .
    _:alice <http://xmlns.com/foaf/0.1/age> "30" <http://example.org/graph1> .
  ''';

  final nquads2 = '''
    _:person1 <http://xmlns.com/foaf/0.1/name> "Alice" .
    _:person1 <http://xmlns.com/foaf/0.1/knows> _:person2 .
    _:person2 <http://xmlns.com/foaf/0.1/name> "Bob" .
    _:person1 <http://xmlns.com/foaf/0.1/age> "30" <http://example.org/graph1> .
  ''';

  // Parse both documents
  final dataset1 = nquads.decode(nquads1);
  final dataset2 = nquads.decode(nquads2);

  // They are different as strings and objects
  print('Strings identical: ${nquads1 == nquads2}'); // false
  print('Objects equal: ${dataset1 == dataset2}'); // false

  // But they are semantically equivalent (isomorphic)
  print('Isomorphic: ${isIsomorphic(dataset1, dataset2)}'); // true

  // Canonicalization produces identical output
  final canonical1 = canonicalize(dataset1);
  final canonical2 = canonicalize(dataset2);
  print('Canonical identical: ${canonical1 == canonical2}'); // true
}
import 'package:locorda_rdf_canonicalization/canonicalization.dart';
import 'package:locorda_rdf_core/core.dart';

void main() {
  // Canonicalization involves cryptographic operations, so it's
  // important to pre-compute canonical forms for efficient comparison
  // and avoid repeated work.

  final graphs = <RdfGraph>[];

  // Create multiple graphs for comparison
  for (int i = 0; i < 100; i++) {
    final graph = RdfGraph(triples: [
      Triple(BlankNodeTerm(), const IriTerm('http://example.org/id'),
          LiteralTerm.string('$i')),
    ]);
    graphs.add(graph);
  }

  // Wrap into CanonicalRdfGraph. A CanonicalRdfGraph will lazily compute the
  // canonical form on first access and cache it. Subsequent accesses to
  // the canonical form will be O(1).
  final canonicalGraphs = graphs.map((g) => CanonicalRdfGraph(g)).toList();

  // Now comparisons are O(1) string comparisons
  // instead of expensive graph isomorphism
  for (int i = 0; i < canonicalGraphs.length; i++) {
    for (int j = i + 1; j < canonicalGraphs.length; j++) {
      if (canonicalGraphs[i] == canonicalGraphs[j]) {
        print('Graphs $i and $j are isomorphic');
      }
    }
  }
}

Use Cases

πŸ” Digital Signatures

Sign RDF data reliably. Canonical forms ensure the same graph always produces the same signature, regardless of serialization order or blank node labels.

πŸ’Ύ Caching & Deduplication

Use canonical forms as consistent cache keys. Detect duplicate graphs even when they're serialized differently.

πŸ”„ Data Synchronization

Detect changes in RDF datasets reliably. Compare canonical forms to know if data has actually changed, not just been re-serialized.

βš–οΈ Graph Comparison

Test semantic equality between different RDF representations. Isomorphism testing made simple and efficient.

πŸ“‹ Compliance

Meet requirements for deterministic RDF serialization. W3C standards-compliant implementation.

πŸ§ͺ Testing

Write reliable RDF tests. Compare expected and actual graphs semantically, not syntactically.

Key Features

πŸ“Š Graph & Dataset Support

Canonicalize both simple RDF graphs and complex datasets with named graphs. Full support for quads and multiple graph contexts.

🎯 Deterministic Blank Nodes

Hash-based blank node labeling algorithm ensures the same graph structure always gets the same blank node identifiers.

⚑ Optimized Performance

Efficient algorithms with O(1) equality comparison when using cached canonical forms. Perfect for large-scale graph comparison.

πŸ”§ Configurable Hashing

Choose between SHA-256 (faster) and SHA-384 (more secure) based on your needs. Custom blank node prefixes supported.

πŸ“ Standards Compliant

Implements the W3C RDF Dataset Canonicalization specification. Interoperable with other compliant implementations.

πŸ”— Seamless Integration

Works perfectly with locorda_rdf_core. Parse any RDF format, canonicalize, and serialize back to any format.

Core API

canonicalize()

Canonicalize an RdfDataset to N-Quads string

final canonical = canonicalize(dataset);

canonicalizeGraph()

Canonicalize an RdfGraph to N-Quads string

final canonical = canonicalizeGraph(graph);

isIsomorphic()

Test if two RdfDatasets are semantically equivalent

if (isIsomorphic(dataset1, dataset2)) {
  print('Semantically identical!');
}

CanonicalRdfGraph

Cached canonical representation for efficient comparison

final canonical = CanonicalRdfGraph(graph);
// O(1) equality comparison
if (canonical1 == canonical2) { ... }

Ready to Canonicalize?

Start using deterministic RDF serialization in your Dart projects.