Tor Protocol Specification

                              Roger Dingledine
                               Nick Mathewson

Table of Contents

    0. Preliminaries
        0.1. Notation and encoding
        0.2. Security parameters
        0.3. Ciphers
        0.4. A bad hybrid encryption algorithm, for legacy purposes
    1. System overview
        1.1. Keys and names
    2. Connections
        2.1. Picking TLS ciphersuites
        2.2. TLS security considerations
    3. Cell Packet format
    4. Negotiating and initializing connections
        4.1. Negotiating versions with VERSIONS cells
        4.2. CERTS cells
        4.3. AUTH_CHALLENGE cells
        4.4. AUTHENTICATE cells
            4.4.1. Link authentication type 1: RSA-SHA256-TLSSecret
            4.4.2. Link authentication type 3: Ed25519-SHA256-RFC5705
        4.5. NETINFO cells
    5. Circuit management
        5.1. CREATE and CREATED cells
            5.1.1. Choosing circuit IDs in create cells
            5.1.2. EXTEND and EXTENDED cells
            5.1.3. The "TAP" handshake
            5.1.4. The "ntor" handshake
            5.1.5. CREATE_FAST/CREATED_FAST cells
        5.2. Setting circuit keys
            5.2.1. KDF-TOR
            5.2.2. KDF-RFC5869
        5.3. Creating circuits
            5.3.1. Canonical connections
        5.4. Tearing down circuits
        5.5. Routing relay cells
            5.5.1. Circuit ID Checks
            5.5.2. Forward Direction
                5.5.2.1. Routing from the Origin
                5.5.2.2. Relaying Forward at Onion Routers
            5.5.3. Backward Direction
                5.5.3.1. Relaying Backward at Onion Routers
            5.5.4. Routing to the Origin
        5.6. Handling relay_early cells
    6. Application connections and stream management
        6.1. Relay cells
            6.1.1. Calculating the 'Digest' field
        6.2. Opening streams and transferring data
            6.2.1. Opening a directory stream
        6.3. Closing streams
        6.4. Remote hostname lookup
    7. Flow control
        7.1. Link throttling
        7.2. Link padding
        7.3. Circuit-level flow control
            7.3.1. SENDME Cell Format
        7.4. Stream-level flow control
    8. Handling resource exhaustion
        8.1. Memory exhaustion
    9. Subprotocol versioning
        9.1. "Link"
        9.2. "LinkAuth"
        9.3. "Relay"
        9.4. "HSIntro"
        9.5. "HSRend"
        9.6. "HSDir"
        9.7. "DirCache"
        9.8. "Desc"
        9.9. "Microdesc"
        9.10. "Cons"
        9.11. "Padding"
        9.12. "FlowCtrl"

Note: This document aims to specify Tor as currently implemented, though it may take it a little time to become fully up to date. Future versions of Tor may implement improved protocols, and compatibility is not guaranteed. We may or may not remove compatibility notes for other obsolete versions of Tor as they become obsolete.

This specification is not a design document; most design criteria are not examined. For more information on why Tor acts as it does, see tor-design.pdf.

Preliminaries

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in
      RFC 2119.

Notation and encoding

   KP -- a public key for an asymmetric cipher.
   KS -- a private key for an asymmetric cipher.
   K  -- a key for a symmetric cipher.
   N  -- a "nonce", a random value, usually deterministically chosen
         from other inputs using hashing.

   a|b -- concatenation of 'a' and 'b'.

[A0 B1 C2] -- a three-byte sequence, containing the bytes with hexadecimal values A0, B1, and C2, in that order.

H(m) -- a cryptographic hash of m.

We use "byte" and "octet" interchangeably. Possibly we shouldn't.

Some specs mention "base32". This means RFC4648, without "=" padding.

Encoding integers

Unless we explicitly say otherwise below, all numeric values in the Tor protocol are encoded in network (big-endian) order. So a "32-bit integer" means a big-endian 32-bit integer; a "2-byte" integer means a big-endian 16-bit integer, and so forth.

Security parameters

Tor uses a stream cipher, a public-key cipher, the Diffie-Hellman protocol, and a hash function.

KEY_LEN -- the length of the stream cipher's key, in bytes.

   KP_ENC_LEN -- the length of a public-key encrypted message, in bytes.
   KP_PAD_LEN -- the number of bytes added in padding for public-key
     encryption, in bytes. (The largest number of bytes that can be encrypted
     in a single public-key operation is therefore KP_ENC_LEN-KP_PAD_LEN.)

   DH_LEN -- the number of bytes used to represent a member of the
     Diffie-Hellman group.
   DH_SEC_LEN -- the number of bytes used in a Diffie-Hellman private key (x).

   HASH_LEN -- the length of the hash function's output, in bytes.

   PAYLOAD_LEN -- The longest allowable cell payload, in bytes. (509)

   CELL_LEN(v) -- The length of a Tor cell, in bytes, for link protocol
      version v.
       CELL_LEN(v) = 512    if v is less than 4;
                   = 514    otherwise.

Ciphers

These are the ciphers we use unless otherwise specified. Several of them are deprecated for new use.

For a stream cipher, unless otherwise specified, we use 128-bit AES in counter mode, with an IV of all 0 bytes. (We also require AES256.)

For a public-key cipher, unless otherwise specified, we use RSA with 1024-bit keys and a fixed exponent of 65537. We use OAEP-MGF1 padding, with SHA-1 as its digest function. We leave the optional "Label" parameter unset. (For OAEP padding, see ftp://ftp.rsasecurity.com/pub/pkcs/pkcs-1/pkcs-1v2-1.pdf)

We also use the Curve25519 group and the Ed25519 signature format in several places.

For Diffie-Hellman, unless otherwise specified, we use a generator (g) of 2. For the modulus (p), we use the 1024-bit safe prime from rfc2409 section 6.2 whose hex representation is:

     "FFFFFFFFFFFFFFFFC90FDAA22168C234C4C6628B80DC1CD129024E08"
     "8A67CC74020BBEA63B139B22514A08798E3404DDEF9519B3CD3A431B"
     "302B0A6DF25F14374FE1356D6D51C245E485B576625E7EC6F44C42E9"
     "A637ED6B0BFF5CB6F406B7EDEE386BFB5A899FA5AE9F24117C4B1FE6"
     "49286651ECE65381FFFFFFFFFFFFFFFF"

As an optimization, implementations SHOULD choose DH private keys (x) of 320 bits. Implementations that do this MUST never use any DH key more than once. [May other implementations reuse their DH keys?? -RD] [Probably not. Conceivably, you could get away with changing DH keys once per second, but there are too many oddball attacks for me to be comfortable that this is safe. -NM]

For a hash function, unless otherwise specified, we use SHA-1.

KEY_LEN=16. DH_LEN=128; DH_SEC_LEN=40. KP_ENC_LEN=128; KP_PAD_LEN=42. HASH_LEN=20.

We also use SHA256 and SHA3-256 in some places.

When we refer to "the hash of a public key", unless otherwise specified, we mean the SHA-1 hash of the DER encoding of an ASN.1 RSA public key (as specified in PKCS.1).

All "random" values MUST be generated with a cryptographically strong pseudorandom number generator seeded from a strong entropy source, unless otherwise noted.

A bad hybrid encryption algorithm, for legacy purposes.

Some specifications will refer to the "legacy hybrid encryption" of a byte sequence M with a public key KP. It is computed as follows:

      1. If the length of M is no more than KP_ENC_LEN-KP_PAD_LEN,
         pad and encrypt M with KP.
      2. Otherwise, generate a KEY_LEN byte random key K.
         Let M1 = the first KP_ENC_LEN-KP_PAD_LEN-KEY_LEN bytes of M,
         and let M2 = the rest of M.
         Pad and encrypt K|M1 with KP.  Encrypt M2 with our stream cipher,
         using the key K.  Concatenate these encrypted values.

Note that this "hybrid encryption" approach does not prevent an attacker from adding or removing bytes to the end of M. It also allows attackers to modify the bytes not covered by the OAEP -- see Goldberg's PET2006 paper for details. Do not use it as the basis for new protocols! Also note that as used in Tor's protocols, case 1 never occurs.

System overview

Tor is a distributed overlay network designed to anonymize low-latency TCP-based applications such as web browsing, secure shell, and instant messaging. Clients choose a path through the network and build a circuit'', in which each node (or onion router'' or OR'') in the path knows its predecessor and successor, but no other nodes in the circuit. Traffic flowing down the circuit is sent in fixed-size cells'', which are unwrapped by a symmetric key at each node (like the layers of an onion) and relayed downstream.

Keys and names

Every Tor relay has multiple public/private keypairs:

These are 1024-bit RSA keys:

    - A long-term signing-only "Identity key" used to sign documents and
      certificates, and used to establish relay identity.
      KP_relayid_rsa, KS_relayid_rsa.
    - A medium-term TAP "Onion key" used to decrypt onion skins when accepting
      circuit extend attempts.  (See 5.1.)  Old keys MUST be accepted for a
      while after they are no longer advertised.  Because of this,
      relays MUST retain old keys for a while after they're rotated.  (See
      "onion key lifetime parameters" in dir-spec.txt.)
      KP_onion_tap, KS_onion_tap.
    - A short-term "Connection key" used to negotiate TLS connections.
      Tor implementations MAY rotate this key as often as they like, and
      SHOULD rotate this key at least once a day.
      KP_conn_tls, KS_conn_tls.

   This is Curve25519 key:

    - A medium-term ntor "Onion key" used to handle onion key handshakes when
      accepting incoming circuit extend requests.  As with TAP onion keys,
      old ntor keys MUST be accepted for at least one week after they are no
      longer advertised.  Because of this, relays MUST retain old keys for a
      while after they're rotated. (See "onion key lifetime parameters" in
      dir-spec.txt.)
      KP_ntor, KS_ntor.

   These are Ed25519 keys:

    - A long-term "master identity" key.  This key never
      changes; it is used only to sign the "signing" key below.  It may be
      kept offline.
      KP_relayid_ed, KS_relayid_ed.
    - A medium-term "signing" key.  This key is signed by the master identity
      key, and must be kept online.  A new one should be generated
      periodically.  It signs nearly everything else.
      KP_relaysign_ed, KS_relaysign_ed.
    - A short-term "link authentication" key, used to authenticate
      the link handshake: see section 4 below.  This key is signed
      by the "signing" key, and should be regenerated frequently.
      KP_link_ed, KS_link_ed.

KP_relayid_* together identify a router uniquely. Once a router has used a KP_relayid_ed (an Ed25519 master identity key) together with a given KP_relayid_rsa (RSA identity key), neither of those keys may ever be used with a different key.

We write KP_relayid to refer to a key which is either KP_relayid_rsa or KP_relayid_ed.

The same key or keypair should never be used for separate roles within the Tor protocol suite, unless specifically stated. For example, a relay's identity keys K_relayid should not also be used as the identity keypair for a hidden service K_hs_id (see rend-spec-v3.txt).

Connections

Connections between two Tor relays, or between a client and a relay, use TLS/SSLv3 for link authentication and encryption. All implementations MUST support the SSLv3 ciphersuite "TLS_DHE_RSA_WITH_AES_128_CBC_SHA" if it is available. They SHOULD support better ciphersuites if available.

There are three ways to perform TLS handshakes with a Tor server. In the first way, "certificates-up-front", both the initiator and responder send a two-certificate chain as part of their initial handshake. (This is supported in all Tor versions.) In the second way, "renegotiation", the responder provides a single certificate, and the initiator immediately performs a TLS renegotiation. (This is supported in Tor 0.2.0.21 and later.) And in the third way, "in-protocol", the initial TLS negotiation completes, and the parties bootstrap themselves to mutual authentication via use of the Tor protocol without further TLS handshaking. (This is supported in 0.2.3.6-alpha and later.)

Each of these options provides a way for the parties to learn it is available: a client does not need to know the version of the Tor server in order to connect to it properly.

In "certificates up-front" (a.k.a "the v1 handshake"), the connection initiator always sends a two-certificate chain, consisting of an X.509 certificate using a short-term connection public key and a second, self-signed X.509 certificate containing its identity key. The other party sends a similar certificate chain. The initiator's ClientHello MUST NOT include any ciphersuites other than:

     TLS_DHE_RSA_WITH_AES_256_CBC_SHA
     TLS_DHE_RSA_WITH_AES_128_CBC_SHA
     SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA

In "renegotiation" (a.k.a. "the v2 handshake"), the connection initiator sends no certificates, and the responder sends a single connection certificate. Once the TLS handshake is complete, the initiator renegotiates the handshake, with each party sending a two-certificate chain as in "certificates up-front". The initiator's ClientHello MUST include at least one ciphersuite not in the list above -- that's how the initiator indicates that it can handle this handshake. For other considerations on the initiator's ClientHello, see section 2.1 below.

In "in-protocol" (a.k.a. "the v3 handshake"), the initiator sends no certificates, and the responder sends a single connection certificate. The choice of ciphersuites must be as in a "renegotiation" handshake. There are additionally a set of constraints on the connection certificate, which the initiator can use to learn that the in-protocol handshake is in use. Specifically, at least one of these properties must be true of the certificate:

      * The certificate is self-signed
      * Some component other than "commonName" is set in the subject or
        issuer DN of the certificate.
      * The commonName of the subject or issuer of the certificate ends
        with a suffix other than ".net".
      * The certificate's public key modulus is longer than 1024 bits.

The initiator then sends a VERSIONS cell to the responder, which then replies with a VERSIONS cell; they have then negotiated a Tor protocol version. Assuming that the version they negotiate is 3 or higher (the only ones specified for use with this handshake right now), the responder sends a CERTS cell, an AUTH_CHALLENGE cell, and a NETINFO cell to the initiator, which may send either CERTS, AUTHENTICATE, NETINFO if it wants to authenticate, or just NETINFO if it does not.

For backward compatibility between later handshakes and "certificates up-front", the ClientHello of an initiator that supports a later handshake MUST include at least one ciphersuite other than those listed above. The connection responder examines the initiator's ciphersuite list to see whether it includes any ciphers other than those included in the list above. If extra ciphers are included, the responder proceeds as in "renegotiation" and "in-protocol": it sends a single certificate and does not request client certificates. Otherwise (in the case that no extra ciphersuites are included in the ClientHello) the responder proceeds as in "certificates up-front": it requests client certificates, and sends a two-certificate chain. In either case, once the responder has sent its certificate or certificates, the initiator counts them. If two certificates have been sent, it proceeds as in "certificates up-front"; otherwise, it proceeds as in "renegotiation" or "in-protocol".

To decide whether to do "renegotiation" or "in-protocol", the initiator checks whether the responder's initial certificate matches the criteria listed above.

All new relay implementations of the Tor protocol MUST support backwards-compatible renegotiation; clients SHOULD do this too. If this is not possible, new client implementations MUST support both "renegotiation" and "in-protocol" and use the router's published link protocols list (see dir-spec.txt on the "protocols" entry) to decide which to use.

In all of the above handshake variants, certificates sent in the clear SHOULD NOT include any strings to identify the host as a Tor relay. In the "renegotiation" and "backwards-compatible renegotiation" steps, the initiator SHOULD choose a list of ciphersuites and TLS extensions to mimic one used by a popular web browser.

Even though the connection protocol is identical, we will think of the initiator as either an onion router (OR) if it is willing to relay traffic for other Tor users, or an onion proxy (OP) if it only handles local requests. Onion proxies SHOULD NOT provide long-term-trackable identifiers in their handshakes.

In all handshake variants, once all certificates are exchanged, all parties receiving certificates must confirm that the identity key is as expected. If the key is not as expected, the party must close the connection.

(When initiating a connection, if a reasonably live consensus is available, then the expected identity key is taken from that consensus. But when initiating a connection otherwise, the expected identity key is the one given in the hard-coded authority or fallback list. Finally, when creating a connection because of an EXTEND/EXTEND2 cell, the expected identity key is the one given in the cell.)

When connecting to an OR, all parties SHOULD reject the connection if that OR has a malformed or missing certificate. When accepting an incoming connection, an OR SHOULD NOT reject incoming connections from parties with malformed or missing certificates. (However, an OR should not believe that an incoming connection is from another OR unless the certificates are present and well-formed.)

[Before version 0.1.2.8-rc, ORs rejected incoming connections from ORs and OPs alike if their certificates were missing or malformed.]

Once a TLS connection is established, the two sides send cells (specified below) to one another. Cells are sent serially. Standard cells are CELL_LEN(link_proto) bytes long, but variable-length cells also exist; see Section 3. Cells may be sent embedded in TLS records of any size or divided across TLS records, but the framing of TLS records MUST NOT leak information about the type or contents of the cells.

TLS connections are not permanent. Either side MAY close a connection if there are no circuits running over it and an amount of time (KeepalivePeriod, defaults to 5 minutes) has passed since the last time any traffic was transmitted over the TLS connection. Clients SHOULD also hold a TLS connection with no circuits open, if it is likely that a circuit will be built soon using that connection.

Client-only Tor instances are encouraged to avoid using handshake variants that include certificates, if those certificates provide any persistent tags to the relays they contact. If clients do use certificates, they SHOULD NOT keep using the same certificates when their IP address changes. Clients MAY send certificates using any of the above handshake variants.

Picking TLS ciphersuites

Clients SHOULD send a ciphersuite list chosen to emulate some popular web browser or other program common on the internet. Clients may send the "Fixed Cipheruite List" below. If they do not, they MUST NOT advertise any ciphersuite that they cannot actually support, unless that cipher is one not supported by OpenSSL 1.0.1.

The fixed ciphersuite list is:

     TLS1_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
     TLS1_ECDHE_RSA_WITH_AES_256_CBC_SHA
     TLS1_DHE_RSA_WITH_AES_256_SHA
     TLS1_DHE_DSS_WITH_AES_256_SHA
     TLS1_ECDH_RSA_WITH_AES_256_CBC_SHA
     TLS1_ECDH_ECDSA_WITH_AES_256_CBC_SHA
     TLS1_RSA_WITH_AES_256_SHA
     TLS1_ECDHE_ECDSA_WITH_RC4_128_SHA
     TLS1_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
     TLS1_ECDHE_RSA_WITH_RC4_128_SHA
     TLS1_ECDHE_RSA_WITH_AES_128_CBC_SHA
     TLS1_DHE_RSA_WITH_AES_128_SHA
     TLS1_DHE_DSS_WITH_AES_128_SHA
     TLS1_ECDH_RSA_WITH_RC4_128_SHA
     TLS1_ECDH_RSA_WITH_AES_128_CBC_SHA
     TLS1_ECDH_ECDSA_WITH_RC4_128_SHA
     TLS1_ECDH_ECDSA_WITH_AES_128_CBC_SHA
     SSL3_RSA_RC4_128_MD5
     SSL3_RSA_RC4_128_SHA
     TLS1_RSA_WITH_AES_128_SHA
     TLS1_ECDHE_ECDSA_WITH_DES_192_CBC3_SHA
     TLS1_ECDHE_RSA_WITH_DES_192_CBC3_SHA
     SSL3_EDH_RSA_DES_192_CBC3_SHA
     SSL3_EDH_DSS_DES_192_CBC3_SHA
     TLS1_ECDH_RSA_WITH_DES_192_CBC3_SHA
     TLS1_ECDH_ECDSA_WITH_DES_192_CBC3_SHA
     SSL3_RSA_FIPS_WITH_3DES_EDE_CBC_SHA
     SSL3_RSA_DES_192_CBC3_SHA
     [*] The "extended renegotiation is supported" ciphersuite, 0x00ff, is
         not counted when checking the list of ciphersuites.

If the client sends the Fixed Ciphersuite List, the responder MUST NOT select any ciphersuite besides TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, and SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA: such ciphers might not actually be supported by the client.

If the client sends a v2+ ClientHello with a list of ciphers other then the Fixed Ciphersuite List, the responder can trust that the client supports every cipher advertised in that list, so long as that ciphersuite is also supported by OpenSSL 1.0.1.

Responders MUST NOT select any TLS ciphersuite that lacks ephemeral keys, or whose symmetric keys are less then KEY_LEN bits, or whose digests are less than HASH_LEN bits. Responders SHOULD NOT select any SSLv3 ciphersuite other than the DHE+3DES suites listed above.

TLS security considerations

Implementations MUST NOT allow TLS session resumption -- it can exacerbate some attacks (e.g. the "Triple Handshake" attack from Feb 2013), and it plays havoc with forward secrecy guarantees.

Implementations SHOULD NOT allow TLS compression -- although we don't know a way to apply a CRIME-style attack to current Tor directly, it's a waste of resources.

Cell Packet format

The basic unit of communication for onion routers and onion proxies is a fixed-width "cell".

On a version 1 connection, each cell contains the following fields:

        CircID                                [CIRCID_LEN bytes]
        Command                               [1 byte]
        Payload (padded with padding bytes)   [PAYLOAD_LEN bytes]

On a version 2 or higher connection, all cells are as in version 1 connections, except for variable-length cells, whose format is:

        CircID                                [CIRCID_LEN octets]
        Command                               [1 octet]
        Length                                [2 octets; big-endian integer]
        Payload (some commands MAY pad)       [Length bytes]

Most variable-length cells MAY be padded with padding bytes, except for VERSIONS cells, which MUST NOT contain any additional bytes. (The payload of VPADDING cells consists of padding bytes.)

On a version 2 connection, variable-length cells are indicated by a command byte equal to 7 ("VERSIONS"). On a version 3 or higher connection, variable-length cells are indicated by a command byte equal to 7 ("VERSIONS"), or greater than or equal to 128.

CIRCID_LEN is 2 for link protocol versions 1, 2, and 3. CIRCID_LEN is 4 for link protocol version 4 or higher. The first VERSIONS cell, and any cells sent before the first VERSIONS cell, always have CIRCID_LEN == 2 for backward compatibility.

The CircID field determines which circuit, if any, the cell is associated with.

The 'Command' field of a fixed-length cell holds one of the following values:

         0 -- PADDING     (Padding)                 (See Sec 7.2)
         1 -- CREATE      (Create a circuit)        (See Sec 5.1)
         2 -- CREATED     (Acknowledge create)      (See Sec 5.1)
         3 -- RELAY       (End-to-end data)         (See Sec 5.5 and 6)
         4 -- DESTROY     (Stop using a circuit)    (See Sec 5.4)
         5 -- CREATE_FAST (Create a circuit, no KP) (See Sec 5.1)
         6 -- CREATED_FAST (Circuit created, no KP) (See Sec 5.1)
         8 -- NETINFO     (Time and address info)   (See Sec 4.5)
         9 -- RELAY_EARLY (End-to-end data; limited)(See Sec 5.6)
         10 -- CREATE2    (Extended CREATE cell)    (See Sec 5.1)
         11 -- CREATED2   (Extended CREATED cell)    (See Sec 5.1)
         12 -- PADDING_NEGOTIATE   (Padding negotiation)    (See Sec 7.2)

    Variable-length command values are:

         7 -- VERSIONS    (Negotiate proto version) (See Sec 4)
         128 -- VPADDING  (Variable-length padding) (See Sec 7.2)
         129 -- CERTS     (Certificates)            (See Sec 4.2)
         130 -- AUTH_CHALLENGE (Challenge value)    (See Sec 4.3)
         131 -- AUTHENTICATE (Client authentication)(See Sec 4.5)
         132 -- AUTHORIZE (Client authorization)    (Not yet used)

   The interpretation of 'Payload' depends on the type of the cell.

      VPADDING/PADDING:
               Payload contains padding bytes.
      CREATE/CREATE2:  Payload contains the handshake challenge.
      CREATED/CREATED2: Payload contains the handshake response.
      RELAY/RELAY_EARLY: Payload contains the relay header and relay body.
      DESTROY: Payload contains a reason for closing the circuit.
               (see 5.4)

Upon receiving any other value for the command field, an OR must drop the cell. Since more cell types may be added in the future, ORs should generally not warn when encountering unrecognized commands.

The cell is padded up to the cell length with padding bytes.

Senders set padding bytes depending on the cell's command:

      VERSIONS:  Payload MUST NOT contain padding bytes.
      AUTHORIZE: Payload is unspecified and reserved for future use.
      Other variable-length cells:
                 Payload MAY contain padding bytes at the end of the cell.
                 Padding bytes SHOULD be set to NUL.
      RELAY/RELAY_EARLY: Payload MUST be padded to PAYLOAD_LEN with padding
                  bytes. Padding bytes SHOULD be set to random values.
      Other fixed-length cells:
                 Payload MUST be padded to PAYLOAD_LEN with padding bytes.
                 Padding bytes SHOULD be set to NUL.

We recommend random padding in RELAY/RELAY_EARLY cells, so that the cell content is unpredictable. See the format of relay cells in section 6.1 for detail.

For other cells, TLS authenticates cell content, so randomized padding bytes are redundant.

Receivers MUST ignore padding bytes.

PADDING cells are currently used to implement connection keepalive. If there is no other traffic, ORs and OPs send one another a PADDING cell every few minutes.

CREATE, CREATE2, CREATED, CREATED2, and DESTROY cells are used to manage circuits; see section 5 below.

RELAY cells are used to send commands and data along a circuit; see section 6 below.

VERSIONS and NETINFO cells are used to set up connections in link protocols v2 and higher; in link protocol v3 and higher, CERTS, AUTH_CHALLENGE, and AUTHENTICATE may also be used. See section 4 below.

Negotiating and initializing connections

After Tor instances negotiate handshake with either the "renegotiation" or "in-protocol" handshakes, they must exchange a set of cells to set up the Tor connection and make it "open" and usable for circuits.

When the renegotiation handshake is used, both parties immediately send a VERSIONS cell (4.1 below), and after negotiating a link protocol version (which will be 2), each send a NETINFO cell (4.5 below) to confirm their addresses and timestamps. No other intervening cell types are allowed.

When the in-protocol handshake is used, the initiator sends a VERSIONS cell to indicate that it will not be renegotiating. The responder sends a VERSIONS cell, a CERTS cell (4.2 below) to give the initiator the certificates it needs to learn the responder's identity, an AUTH_CHALLENGE cell (4.3) that the initiator must include as part of its answer if it chooses to authenticate, and a NETINFO cell (4.5). As soon as it gets the CERTS cell, the initiator knows whether the responder is correctly authenticated. At this point the initiator behaves differently depending on whether it wants to authenticate or not. If it does not want to authenticate, it MUST send a NETINFO cell. If it does want to authenticate, it MUST send a CERTS cell, an AUTHENTICATE cell (4.4), and a NETINFO. When this handshake is in use, the first cell must be VERSIONS, VPADDING, or AUTHORIZE, and no other cell type is allowed to intervene besides those specified, except for VPADDING cells.

The AUTHORIZE cell type is reserved for future use by scanning-resistance designs.

[Tor versions before 0.2.3.11-alpha did not recognize the AUTHORIZE cell, and did not permit any command other than VERSIONS as the first cell of the in-protocol handshake.]

Negotiating versions with VERSIONS cells

There are multiple instances of the Tor link connection protocol. Any connection negotiated using the "certificates up front" handshake (see section 2 above) is "version 1". In any connection where both parties have behaved as in the "renegotiation" handshake, the link protocol version must be 2. In any connection where both parties have behaved as in the "in-protocol" handshake, the link protocol must be 3 or higher.

To determine the version, in any connection where the "renegotiation" or "in-protocol" handshake was used (that is, where the responder sent only one certificate at first and where the initiator did not send any certificates in the first negotiation), both parties MUST send a VERSIONS cell. In "renegotiation", they send a VERSIONS cell right after the renegotiation is finished, before any other cells are sent. In "in-protocol", the initiator sends a VERSIONS cell immediately after the initial TLS handshake, and the responder replies immediately with a VERSIONS cell. (As an exception to this rule, if both sides support the "in-protocol" handshake, either side may send VPADDING cells at any time.)

The payload in a VERSIONS cell is a series of big-endian two-byte integers. Both parties MUST select as the link protocol version the highest number contained both in the VERSIONS cell they sent and in the versions cell they received. If they have no such version in common, they cannot communicate and MUST close the connection. Either party MUST close the connection if the versions cell is not well-formed (for example, if the payload contains an odd number of bytes).

Any VERSIONS cells sent after the first VERSIONS cell MUST be ignored. (To be interpreted correctly, later VERSIONS cells MUST have a CIRCID_LEN matching the version negotiated with the first VERSIONS cell.)

Since the version 1 link protocol does not use the "renegotiation" handshake, implementations MUST NOT list version 1 in their VERSIONS cell. When the "renegotiation" handshake is used, implementations MUST list only the version 2. When the "in-protocol" handshake is used, implementations MUST NOT list any version before 3, and SHOULD list at least version 3.

Link protocols differences are:

     1 -- The "certs up front" handshake.
     2 -- Uses the renegotiation-based handshake. Introduces
          variable-length cells.
     3 -- Uses the in-protocol handshake.
     4 -- Increases circuit ID width to 4 bytes.
     5 -- Adds support for link padding and negotiation (padding-spec.txt).

CERTS cells

The CERTS cell describes the keys that a Tor instance is claiming to have. It is a variable-length cell. Its payload format is:

        N: Number of certs in cell            [1 octet]
        N times:
           CertType                           [1 octet]
           CLEN                               [2 octets]
           Certificate                        [CLEN octets]

   Any extra octets at the end of a CERTS cell MUST be ignored.

     Relevant certType values are:
        1: Link key certificate certified by RSA1024 identity
        2: RSA1024 Identity certificate, self-signed.
        3: RSA1024 AUTHENTICATE cell link certificate, signed with RSA1024 key.
        4: Ed25519 signing key, signed with identity key.
        5: TLS link certificate, signed with ed25519 signing key.
        6: Ed25519 AUTHENTICATE cell key, signed with ed25519 signing key.
        7: Ed25519 identity, signed with RSA identity.

The certificate format for certificate types 1-3 is DER encoded X509. For others, the format is as documented in cert-spec.txt. Note that type 7 uses a different format from types 4-6.

A CERTS cell may have no more than one certificate of each CertType.

To authenticate the responder as having a given Ed25519,RSA identity key combination, the initiator MUST check the following.

     * The CERTS cell contains exactly one CertType 2 "ID" certificate.
     * The CERTS cell contains exactly one CertType 4 Ed25519
       "Id->Signing" cert.
     * The CERTS cell contains exactly one CertType 5 Ed25519
       "Signing->link" certificate.
     * The CERTS cell contains exactly one CertType 7 "RSA->Ed25519"
       cross-certificate.
     * All X.509 certificates above have validAfter and validUntil dates;
       no X.509 or Ed25519 certificates are expired.
     * All certificates are correctly signed.
     * The certified key in the Signing->Link certificate matches the
       SHA256 digest of the certificate that was used to
       authenticate the TLS connection.
     * The identity key listed in the ID->Signing cert was used to
       sign the ID->Signing Cert.
     * The Signing->Link cert was signed with the Signing key listed
       in the ID->Signing cert.
     * The RSA->Ed25519 cross-certificate certifies the Ed25519
       identity, and is signed with the RSA identity listed in the
       "ID" certificate.
     * The certified key in the ID certificate is a 1024-bit RSA key.
     * The RSA ID certificate is correctly self-signed.

To authenticate the responder as having a given RSA identity only, the initiator MUST check the following:

     * The CERTS cell contains exactly one CertType 1 "Link" certificate.
     * The CERTS cell contains exactly one CertType 2 "ID" certificate.
     * Both certificates have validAfter and validUntil dates that
       are not expired.
     * The certified key in the Link certificate matches the
       link key that was used to negotiate the TLS connection.
     * The certified key in the ID certificate is a 1024-bit RSA key.
     * The certified key in the ID certificate was used to sign both
       certificates.
     * The link certificate is correctly signed with the key in the
       ID certificate
     * The ID certificate is correctly self-signed.

In both cases above, checking these conditions is sufficient to authenticate that the initiator is talking to the Tor node with the expected identity, as certified in the ID certificate(s).

To authenticate the initiator as having a given Ed25519,RSA identity key combination, the responder MUST check the following:

     * The CERTS cell contains exactly one CertType 2 "ID" certificate.
     * The CERTS cell contains exactly one CertType 4 Ed25519
       "Id->Signing" certificate.
     * The CERTS cell contains exactly one CertType 6 Ed25519
       "Signing->auth" certificate.
     * The CERTS cell contains exactly one CertType 7 "RSA->Ed25519"
       cross-certificate.
     * All X.509 certificates above have validAfter and validUntil dates;
       no X.509 or Ed25519 certificates are expired.
     * All certificates are correctly signed.
     * The identity key listed in the ID->Signing cert was used to
       sign the ID->Signing Cert.
     * The Signing->AUTH cert was signed with the Signing key listed
       in the ID->Signing cert.
     * The RSA->Ed25519 cross-certificate certifies the Ed25519
       identity, and is signed with the RSA identity listed in the
       "ID" certificate.
     * The certified key in the ID certificate is a 1024-bit RSA key.
     * The RSA ID certificate is correctly self-signed.

To authenticate the initiator as having an RSA identity key only, the responder MUST check the following:

     * The CERTS cell contains exactly one CertType 3 "AUTH" certificate.
     * The CERTS cell contains exactly one CertType 2 "ID" certificate.
     * Both certificates have validAfter and validUntil dates that
       are not expired.
     * The certified key in the AUTH certificate is a 1024-bit RSA key.
     * The certified key in the ID certificate is a 1024-bit RSA key.
     * The certified key in the ID certificate was used to sign both
       certificates.
     * The auth certificate is correctly signed with the key in the
       ID certificate.
     * The ID certificate is correctly self-signed.

Checking these conditions is NOT sufficient to authenticate that the initiator has the ID it claims; to do so, the cells in 4.3 and 4.4 below must be exchanged.

AUTH_CHALLENGE cells

An AUTH_CHALLENGE cell is a variable-length cell with the following fields:

       Challenge [32 octets]
       N_Methods [2 octets]
       Methods   [2 * N_Methods octets]

It is sent from the responder to the initiator. Initiators MUST ignore unexpected bytes at the end of the cell. Responders MUST generate every challenge independently using a strong RNG or PRNG.

The Challenge field is a randomly generated string that the initiator must sign (a hash of) as part of authenticating. The methods are the authentication methods that the responder will accept. Only two authentication methods are defined right now: see 4.4.1 and 4.4.2 below.

AUTHENTICATE cells

If an initiator wants to authenticate, it responds to the AUTH_CHALLENGE cell with a CERTS cell and an AUTHENTICATE cell. The CERTS cell is as a server would send, except that instead of sending a CertType 1 (and possibly CertType 5) certs for arbitrary link certificates, the initiator sends a CertType 3 (and possibly CertType 6) cert for an RSA/Ed25519 AUTHENTICATE key.

This difference is because we allow any link key type on a TLS link, but the protocol described here will only work for specific key types as described in 4.4.1 and 4.4.2 below.

An AUTHENTICATE cell contains the following:

        AuthType                              [2 octets]
        AuthLen                               [2 octets]
        Authentication                        [AuthLen octets]

Responders MUST ignore extra bytes at the end of an AUTHENTICATE cell. Recognized AuthTypes are 1 and 3, described in the next two sections.

Initiators MUST NOT send an AUTHENTICATE cell before they have verified the certificates presented in the responder's CERTS cell, and authenticated the responder.

Link authentication type 1: RSA-SHA256-TLSSecret

If AuthType is 1 (meaning "RSA-SHA256-TLSSecret"), then the Authentication field of the AUTHENTICATE cell contains the following:

       TYPE: The characters "AUTH0001" [8 octets]
       CID: A SHA256 hash of the initiator's RSA1024 identity key [32 octets]
       SID: A SHA256 hash of the responder's RSA1024 identity key [32 octets]
       SLOG: A SHA256 hash of all bytes sent from the responder to the
         initiator as part of the negotiation up to and including the
         AUTH_CHALLENGE cell; that is, the VERSIONS cell, the CERTS cell,
         the AUTH_CHALLENGE cell, and any padding cells.  [32 octets]
       CLOG: A SHA256 hash of all bytes sent from the initiator to the
         responder as part of the negotiation so far; that is, the
         VERSIONS cell and the CERTS cell and any padding cells. [32
         octets]
       SCERT: A SHA256 hash of the responder's TLS link certificate. [32
         octets]
       TLSSECRETS: A SHA256 HMAC, using the TLS master secret as the
         secret key, of the following:
           - client_random, as sent in the TLS Client Hello
           - server_random, as sent in the TLS Server Hello
           - the NUL terminated ASCII string:
             "Tor V3 handshake TLS cross-certification"
          [32 octets]
       RAND: A 24 byte value, randomly chosen by the initiator.  (In an
         imitation of SSL3's gmt_unix_time field, older versions of Tor
         sent an 8-byte timestamp as the first 8 bytes of this field;
         new implementations should not do that.) [24 octets]
       SIG: A signature of a SHA256 hash of all the previous fields
         using the initiator's "Authenticate" key as presented.  (As
         always in Tor, we use OAEP-MGF1 padding; see tor-spec.txt
         section 0.3.)
          [variable length]

To check the AUTHENTICATE cell, a responder checks that all fields from TYPE through TLSSECRETS contain their unique correct values as described above, and then verifies the signature. The server MUST ignore any extra bytes in the signed data after the RAND field.

Responders MUST NOT accept this AuthType if the initiator has claimed to have an Ed25519 identity.

(There is no AuthType 2: It was reserved but never implemented.)

Link authentication type 3: Ed25519-SHA256-RFC5705.

If AuthType is 3, meaning "Ed25519-SHA256-RFC5705", the Authentication field of the AuthType cell is as below:

Modified values and new fields below are marked with asterisks.

       TYPE: The characters "AUTH0003" [8 octets]
       CID: A SHA256 hash of the initiator's RSA1024 identity key [32 octets]
       SID: A SHA256 hash of the responder's RSA1024 identity key [32 octets]
       CID_ED: The initiator's Ed25519 identity key [32 octets]
       SID_ED: The responder's Ed25519 identity key, or all-zero. [32 octets]
       SLOG: A SHA256 hash of all bytes sent from the responder to the
         initiator as part of the negotiation up to and including the
         AUTH_CHALLENGE cell; that is, the VERSIONS cell, the CERTS cell,
         the AUTH_CHALLENGE cell, and any padding cells.  [32 octets]
       CLOG: A SHA256 hash of all bytes sent from the initiator to the
         responder as part of the negotiation so far; that is, the
         VERSIONS cell and the CERTS cell and any padding cells. [32
         octets]
       SCERT: A SHA256 hash of the responder's TLS link certificate. [32
         octets]
       TLSSECRETS: The output of an RFC5705 Exporter function on the
         TLS session, using as its inputs:
          - The label string "EXPORTER FOR TOR TLS CLIENT BINDING AUTH0003"
          - The context value equal to the initiator's Ed25519 identity key.
          - The length 32.
            [32 octets]
       RAND: A 24 byte value, randomly chosen by the initiator. [24 octets]
       SIG: A signature of all previous fields using the initiator's
          Ed25519 authentication key (as in the cert with CertType 6).
          [variable length]

NETINFO cells

If version 2 or higher is negotiated, each party sends the other a NETINFO cell. The cell's payload is:

      TIME       (Timestamp)                     [4 bytes]
      OTHERADDR  (Other OR's address)            [variable]
         ATYPE   (Address type)                  [1 byte]
         ALEN    (Address length)                [1 byte]
         AVAL    (Address value in NBO)          [ALEN bytes]
      NMYADDR    (Number of this OR's addresses) [1 byte]
        NMYADDR times:
          ATYPE   (Address type)                 [1 byte]
          ALEN    (Address length)               [1 byte]
          AVAL    (Address value in NBO))        [ALEN bytes]

   Recognized address types (ATYPE) are:

     [04] IPv4.
     [06] IPv6.

ALEN MUST be 4 when ATYPE is 0x04 (IPv4) and 16 when ATYPE is 0x06 (IPv6). If the ALEN value is wrong for the given ATYPE value, then the provided address should be ignored.

The timestamp is a big-endian unsigned integer number of seconds since the Unix epoch. Implementations MUST ignore unexpected bytes at the end of the cell. Clients SHOULD send "0" as their timestamp, to avoid fingerprinting.

Implementations MAY use the timestamp value to help decide if their clocks are skewed. Initiators MAY use "other OR's address" to help learn which address their connections may be originating from, if they do not know it; and to learn whether the peer will treat the current connection as canonical. Implementations SHOULD NOT trust these values unconditionally, especially when they come from non-authorities, since the other party can lie about the time or IP addresses it sees.

Initiators SHOULD use "this OR's address" to make sure that they have connected to another OR at its canonical address. (See 5.3.1 below.)

Circuit management

CREATE and CREATED cells

Users set up circuits incrementally, one hop at a time. To create a new circuit, OPs send a CREATE/CREATE2 cell to the first node, with the first half of an authenticated handshake; that node responds with a CREATED/CREATED2 cell with the second half of the handshake. To extend a circuit past the first hop, the OP sends an EXTEND/EXTEND2 relay cell (see section 5.1.2) which instructs the last node in the circuit to send a CREATE/CREATE2 cell to extend the circuit.

There are two kinds of CREATE and CREATED cells: The older "CREATE/CREATED" format, and the newer "CREATE2/CREATED2" format. The newer format is extensible by design; the older one is not.

A CREATE2 cell contains:

       HTYPE     (Client Handshake Type)     [2 bytes]
       HLEN      (Client Handshake Data Len) [2 bytes]
       HDATA     (Client Handshake Data)     [HLEN bytes]

   A CREATED2 cell contains:

       HLEN      (Server Handshake Data Len) [2 bytes]
       HDATA     (Server Handshake Data)     [HLEN bytes]

   Recognized HTYPEs (handshake types) are:

       0x0000  TAP  -- the original Tor handshake; see 5.1.3
       0x0001  reserved
       0x0002  ntor -- the ntor+curve25519+sha256 handshake; see 5.1.4

   The format of a CREATE cell is one of the following:

       HDATA     (Client Handshake Data)     [TAP_C_HANDSHAKE_LEN bytes]

   or

       HTAG      (Client Handshake Type Tag) [16 bytes]
       HDATA     (Client Handshake Data)     [TAP_C_HANDSHAKE_LEN-16 bytes]

The first format is equivalent to a CREATE2 cell with HTYPE of 'tap' and length of TAP_C_HANDSHAKE_LEN. The second format is a way to encapsulate new handshake types into the old CREATE cell format for migration. See 5.1.2 below. Recognized HTAG values are:

ntor -- 'ntorNTORntorNTOR'

The format of a CREATED cell is:

HDATA (Server Handshake Data) [TAP_S_HANDSHAKE_LEN bytes]

(It's equivalent to a CREATED2 cell with length of TAP_S_HANDSHAKE_LEN.)

As usual with DH, x and y MUST be generated randomly.

In general, clients SHOULD use CREATE whenever they are using the TAP handshake, and CREATE2 otherwise. Clients SHOULD NOT send the second format of CREATE cells (the one with the handshake type tag) to a server directly.

Servers always reply to a successful CREATE with a CREATED, and to a successful CREATE2 with a CREATED2. On failure, a server sends a DESTROY cell to tear down the circuit.

[CREATE2 is handled by Tor 0.2.4.7-alpha and later.]

Choosing circuit IDs in create cells

The CircID for a CREATE/CREATE2 cell is a nonzero integer, selected by the node (OP or OR) that sends the CREATE/CREATED2 cell. Depending on the link protocol version, there are certain rules for choosing the value of CircID which MUST be obeyed, as implementations MAY decide to refuse in case of a violation. In link protocol 3 or lower, CircIDs are 2 bytes long; in protocol 4 or higher, CircIDs are 4 bytes long.

In link protocol version 3 or lower, the nodes choose from only one half of the possible values based on the ORs' public identity keys, in order to avoid collisions. If the sending node has a lower key, it chooses a CircID with an MSB of 0; otherwise, it chooses a CircID with an MSB of 1. (Public keys are compared numerically by modulus.) A client with no public key MAY choose any CircID it wishes, since clients never need to process CREATE/CREATE2 cells.

In link protocol version 4 or higher, whichever node initiated the connection MUST set its MSB to 1, and whichever node didn't initiate the connection MUST set its MSB to 0.

The CircID value 0 is specifically reserved for cells that do not belong to any circuit: CircID 0 MUST not be used for circuits. No other CircID value, including 0x8000 or 0x80000000, is reserved.

Existing Tor implementations choose their CircID values at random from among the available unused values. To avoid distinguishability, new implementations should do the same. Implementations MAY give up and stop attempting to build new circuits on a channel, if a certain number of randomly chosen CircID values are all in use (today's Tor stops after 64).

EXTEND and EXTENDED cells

To extend an existing circuit, the client sends an EXTEND or EXTEND2 RELAY_EARLY cell to the last node in the circuit.

An EXTEND2 cell's relay payload contains:

       NSPEC      (Number of link specifiers)     [1 byte]
         NSPEC times:
           LSTYPE (Link specifier type)           [1 byte]
           LSLEN  (Link specifier length)         [1 byte]
           LSPEC  (Link specifier)                [LSLEN bytes]
       HTYPE      (Client Handshake Type)         [2 bytes]
       HLEN       (Client Handshake Data Len)     [2 bytes]
       HDATA      (Client Handshake Data)         [HLEN bytes]

Link specifiers describe the next node in the circuit and how to connect to it. Recognized specifiers are:

      [00] TLS-over-TCP, IPv4 address
           A four-byte IPv4 address plus two-byte ORPort
      [01] TLS-over-TCP, IPv6 address
           A sixteen-byte IPv6 address plus two-byte ORPort
      [02] Legacy identity
           A 20-byte SHA1 identity fingerprint. At most one may be listed.
      [03] Ed25519 identity
           A 32-byte Ed25519 identity fingerprint. At most one may
           be listed.

Nodes MUST ignore unrecognized specifiers, and MUST accept multiple instances of specifiers other than 'legacy identity' and 'Ed25519 identity'. (Nodes SHOULD reject link specifier lists that include multiple instances of either one of those specifiers.)

For purposes of indistinguishability, implementations SHOULD send these link specifiers, if using them, in this order: [00], [02], [03], [01].

The relay payload for an EXTEND relay cell consists of:

         Address                       [4 bytes]
         Port                          [2 bytes]
         Onion skin                    [TAP_C_HANDSHAKE_LEN bytes]
         Identity fingerprint          [HASH_LEN bytes]

The "legacy identity" and "identity fingerprint" fields are the SHA1 hash of the PKCS#1 ASN1 encoding of the next onion router's identity (signing) key. (See 0.3 above.) The "Ed25519 identity" field is the Ed25519 identity key of the target node. Including this key information allows the extending OR verify that it is indeed connected to the correct target OR, and prevents certain man-in-the-middle attacks.

Extending ORs MUST check all provided identity keys (if they recognize the format), and and MUST NOT extend the circuit if the target OR did not prove its ownership of any such identity key. If only one identity key is provided, but the extending OR knows the other (from directory information), then the OR SHOULD also enforce the key in the directory.

If an extending OR has a channel with a given Ed25519 ID and RSA identity, and receives a request for that Ed25519 ID and a different RSA identity, it SHOULD NOT attempt to make another connection: it should just fail and DESTROY the circuit.

The client MAY include multiple IPv4 or IPv6 link specifiers in an EXTEND cell; current OR implementations only consider the first of each type.

After checking relay identities, extending ORs generate a CREATE/CREATE2 cell from the contents of the EXTEND/EXTEND2 cell. See section 5.3 for details.

The payload of an EXTENDED cell is the same as the payload of a CREATED cell.

The payload of an EXTENDED2 cell is the same as the payload of a CREATED2 cell.

[Support for EXTEND2/EXTENDED2 was added in Tor 0.2.4.8-alpha.]

Clients SHOULD use the EXTEND format whenever sending a TAP handshake, and MUST use it whenever the EXTEND cell will be handled by a node running a version of Tor too old to support EXTEND2. In other cases, clients SHOULD use EXTEND2.

When generating an EXTEND2 cell, clients SHOULD include the target's Ed25519 identity whenever the target has one, and whenever the target supports LinkAuth subprotocol version "3". (See section 9.2.)

When encoding a non-TAP handshake in an EXTEND cell, clients SHOULD use the format with 'client handshake type tag'.

The "TAP" handshake

This handshake uses Diffie-Hellman in Z_p and RSA to compute a set of shared keys which the client knows are shared only with a particular server, and the server knows are shared with whomever sent the original handshake (or with nobody at all). It's not very fast and not very good. (See Goldberg's "On the Security of the Tor Authentication Protocol".)

Define TAP_C_HANDSHAKE_LEN as DH_LEN+KEY_LEN+KP_PAD_LEN. Define TAP_S_HANDSHAKE_LEN as DH_LEN+HASH_LEN.

The payload for a CREATE cell is an 'onion skin', which consists of the first step of the DH handshake data (also known as g^x). This value is encrypted using the "legacy hybrid encryption" algorithm (see 0.4 above) to the server's onion key, giving a client handshake:

       KP-encrypted:
         Padding                       [KP_PAD_LEN bytes]
         Symmetric key                 [KEY_LEN bytes]
         First part of g^x             [KP_ENC_LEN-KP_PAD_LEN-KEY_LEN bytes]
       Symmetrically encrypted:
         Second part of g^x            [DH_LEN-(KP_ENC_LEN-KP_PAD_LEN-KEY_LEN)
                                           bytes]

The payload for a CREATED cell, or the relay payload for an EXTENDED cell, contains:

         DH data (g^y)                 [DH_LEN bytes]
         Derivative key data (KH)      [HASH_LEN bytes]   <see 5.2 below>

Once the handshake between the OP and an OR is completed, both can now calculate g^xy with ordinary DH. Before computing g^xy, both parties MUST verify that the received g^x or g^y value is not degenerate; that is, it must be strictly greater than 1 and strictly less than p-1 where p is the DH modulus. Implementations MUST NOT complete a handshake with degenerate keys. Implementations MUST NOT discard other "weak" g^x values.

(Discarding degenerate keys is critical for security; if bad keys are not discarded, an attacker can substitute the OR's CREATED cell's g^y with 0 or 1, thus creating a known g^xy and impersonating the OR. Discarding other keys may allow attacks to learn bits of the private key.)

Once both parties have g^xy, they derive their shared circuit keys and 'derivative key data' value via the KDF-TOR function in 5.2.1.

The "ntor" handshake

This handshake uses a set of DH handshakes to compute a set of shared keys which the client knows are shared only with a particular server, and the server knows are shared with whomever sent the original handshake (or with nobody at all). Here we use the "curve25519" group and representation as specified in "Curve25519: new Diffie-Hellman speed records" by D. J. Bernstein.

[The ntor handshake was added in Tor 0.2.4.8-alpha.]

In this section, define:

      H(x,t) as HMAC_SHA256 with message x and key t.
      H_LENGTH  = 32.
      ID_LENGTH = 20.
      G_LENGTH  = 32
      PROTOID   = "ntor-curve25519-sha256-1"
      t_mac     = PROTOID | ":mac"
      t_key     = PROTOID | ":key_extract"
      t_verify  = PROTOID | ":verify"
      G         = The preferred base point for curve25519 ([9])
      KEYGEN()  = The curve25519 key generation algorithm, returning
                  a private/public keypair.
      m_expand  = PROTOID | ":key_expand"
      KEYID(A)  = A
      EXP(a, b) = The ECDH algorithm for establishing a shared secret.

To perform the handshake, the client needs to know an identity key digest for the server, and an ntor onion key (a curve25519 public key) for that server. Call the ntor onion key "B". The client generates a temporary keypair:

x,X = KEYGEN()

and generates a client-side handshake with contents:

       NODEID      Server identity digest  [ID_LENGTH bytes]
       KEYID       KEYID(B)                [H_LENGTH bytes]
       CLIENT_KP   X                       [G_LENGTH bytes]

The server generates a keypair of y,Y = KEYGEN(), and uses its ntor private key 'b' to compute:

     secret_input = EXP(X,y) | EXP(X,b) | ID | B | X | Y | PROTOID
     KEY_SEED = H(secret_input, t_key)
     verify = H(secret_input, t_verify)
     auth_input = verify | ID | B | Y | X | PROTOID | "Server"

   The server's handshake reply is:

       SERVER_KP   Y                       [G_LENGTH bytes]
       AUTH        H(auth_input, t_mac)    [H_LENGTH bytes]

   The client then checks Y is in G^* [see NOTE below], and computes

     secret_input = EXP(Y,x) | EXP(B,x) | ID | B | X | Y | PROTOID
     KEY_SEED = H(secret_input, t_key)
     verify = H(secret_input, t_verify)
     auth_input = verify | ID | B | Y | X | PROTOID | "Server"

   The client verifies that AUTH == H(auth_input, t_mac).

Both parties check that none of the EXP() operations produced the point at infinity. [NOTE: This is an adequate replacement for checking Y for group membership, if the group is curve25519.]

Both parties now have a shared value for KEY_SEED. They expand this into the keys needed for the Tor relay protocol, using the KDF described in 5.2.2 and the tag m_expand.

CREATE_FAST/CREATED_FAST cells

When initializing the first hop of a circuit, the OP has already established the OR's identity and negotiated a secret key using TLS. Because of this, it is not always necessary for the OP to perform the public key operations to create a circuit. In this case, the OP MAY send a CREATE_FAST cell instead of a CREATE cell for the first hop only. The OR responds with a CREATED_FAST cell, and the circuit is created.

A CREATE_FAST cell contains:

Key material (X) [HASH_LEN bytes]

A CREATED_FAST cell contains:

       Key material (Y)    [HASH_LEN bytes]
       Derivative key data [HASH_LEN bytes] (See 5.2.1 below)

   The values of X and Y must be generated randomly.

Once both parties have X and Y, they derive their shared circuit keys and 'derivative key data' value via the KDF-TOR function in 5.2.1.

The CREATE_FAST handshake is currently deprecated whenever it is not necessary; the migration is controlled by the "usecreatefast" networkstatus parameter as described in dir-spec.txt.

[Tor 0.3.1.1-alpha and later disable CREATE_FAST by default.]

Setting circuit keys

KDF-TOR

This key derivation function is used by the TAP and CREATE_FAST handshakes, and in the current hidden service protocol. It shouldn't be used for new functionality.

If the TAP handshake is used to extend a circuit, both parties base their key material on K0=g^xy, represented as a big-endian unsigned integer.

If CREATE_FAST is used, both parties base their key material on K0=X|Y.

From the base key material K0, they compute KEY_LEN2+HASH_LEN3 bytes of derivative key data as

K = H(K0 | [00]) | H(K0 | [01]) | H(K0 | [02]) | ...

The first HASH_LEN bytes of K form KH; the next HASH_LEN form the forward digest Df; the next HASH_LEN 41-60 form the backward digest Db; the next KEY_LEN 61-76 form Kf, and the final KEY_LEN form Kb. Excess bytes from K are discarded.

KH is used in the handshake response to demonstrate knowledge of the computed shared key. Df is used to seed the integrity-checking hash for the stream of data going from the OP to the OR, and Db seeds the integrity-checking hash for the data stream from the OR to the OP. Kf is used to encrypt the stream of data going from the OP to the OR, and Kb is used to encrypt the stream of data going from the OR to the OP.

KDF-RFC5869

For newer KDF needs, Tor uses the key derivation function HKDF from RFC5869, instantiated with SHA256. (This is due to a construction from Krawczyk.) The generated key material is:

K = K_1 | K_2 | K_3 | ...

       Where H(x,t) is HMAC_SHA256 with value x and key t
         and K_1     = H(m_expand | INT8(1) , KEY_SEED )
         and K_(i+1) = H(K_i | m_expand | INT8(i+1) , KEY_SEED )
         and m_expand is an arbitrarily chosen value,
         and INT8(i) is a octet with the value "i".

In RFC5869's vocabulary, this is HKDF-SHA256 with info == m_expand, salt == t_key, and IKM == secret_input.

When used in the ntor handshake, the first HASH_LEN bytes form the forward digest Df; the next HASH_LEN form the backward digest Db; the next KEY_LEN form Kf, the next KEY_LEN form Kb, and the final DIGEST_LEN bytes are taken as a nonce to use in the place of KH in the hidden service protocol. Excess bytes from K are discarded.

Creating circuits

When creating a circuit through the network, the circuit creator (OP) performs the following steps:

      1. Choose an onion router as an end node (R_N):
         * N MAY be 1 for non-anonymous directory mirror, introduction point,
           or service rendezvous connections.
         * N SHOULD be 3 or more for anonymous connections.
         Some end nodes accept streams (see 6.1), others are introduction
         or rendezvous points (see rend-spec-{v2,v3}.txt).

      2. Choose a chain of (N-1) onion routers (R_1...R_N-1) to constitute
         the path, such that no router appears in the path twice.

      3. If not already connected to the first router in the chain,
         open a new connection to that router.

      4. Choose a circID not already in use on the connection with the
         first router in the chain; send a CREATE/CREATE2 cell along
         the connection, to be received by the first onion router.

      5. Wait until a CREATED/CREATED2 cell is received; finish the
         handshake and extract the forward key Kf_1 and the backward
         key Kb_1.

      6. For each subsequent onion router R (R_2 through R_N), extend
         the circuit to R.

To extend the circuit by a single onion router R_M, the OP performs these steps:

Create an onion skin, encrypted to R_M's public onion key.

      2. Send the onion skin in a relay EXTEND/EXTEND2 cell along
         the circuit (see sections 5.1.2 and 5.5).

      3. When a relay EXTENDED/EXTENDED2 cell is received, verify KH,
         and calculate the shared keys.  The circuit is now extended.

When an onion router receives an EXTEND relay cell, it sends a CREATE cell to the next onion router, with the enclosed onion skin as its payload.

When an onion router receives an EXTEND2 relay cell, it sends a CREATE2 cell to the next onion router, with the enclosed HLEN, HTYPE, and HDATA as its payload. The initiating onion router chooses some circID not yet used on the connection between the two onion routers. (But see section 5.1.1 above, concerning choosing circIDs.)

As special cases, if the EXTEND/EXTEND2 cell includes a legacy identity, or identity fingerprint of all zeroes, or asks to extend back to the relay that sent the extend cell, the circuit will fail and be torn down.

Ed25519 identity keys are not required in EXTEND2 cells, so all zero keys SHOULD be accepted. If the extending relay knows the ed25519 key from the consensus, it SHOULD also check that key. (See section 5.1.2.)

If an EXTEND2 cell contains the ed25519 key of the relay that sent the extend cell, the circuit will fail and be torn down.

When an onion router receives a CREATE/CREATE2 cell, if it already has a circuit on the given connection with the given circID, it drops the cell. Otherwise, after receiving the CREATE/CREATE2 cell, it completes the specified handshake, and replies with a CREATED/CREATED2 cell.

Upon receiving a CREATED/CREATED2 cell, an onion router packs it payload into an EXTENDED/EXTENDED2 relay cell (see section 5.1.2), and sends that cell up the circuit. Upon receiving the EXTENDED/EXTENDED2 relay cell, the OP can retrieve the handshake material.

(As an optimization, OR implementations may delay processing onions until a break in traffic allows time to do so without harming network latency too greatly.)

Canonical connections

It is possible for an attacker to launch a man-in-the-middle attack against a connection by telling OR Alice to extend to OR Bob at some address X controlled by the attacker. The attacker cannot read the encrypted traffic, but the attacker is now in a position to count all bytes sent between Alice and Bob (assuming Alice was not already connected to Bob.)

To prevent this, when an OR gets an extend request, it SHOULD use an existing OR connection if the ID matches, and ANY of the following conditions hold:

       - The IP matches the requested IP.
       - The OR knows that the IP of the connection it's using is canonical
         because it was listed in the NETINFO cell.

    ORs SHOULD NOT check the IPs that are listed in the server descriptor.
    Trusting server IPs makes it easier to covertly impersonate a relay, after
    stealing its keys.

Tearing down circuits

Circuits are torn down when an unrecoverable error occurs along the circuit, or when all streams on a circuit are closed and the circuit's intended lifetime is over.

ORs SHOULD also tear down circuits which attempt to create:

streams with RELAY_BEGIN, or
rendezvous points with ESTABLISH_RENDEZVOUS, ending at the first hop. Letting Tor be used as a single hop proxy makes exit and rendezvous nodes a more attractive target for compromise.

ORs MAY use multiple methods to check if they are the first hop:

   * If an OR sees a circuit created with CREATE_FAST, the OR is sure to be
     the first hop of a circuit.
   * If an OR is the responder, and the initiator:
     * did not authenticate the link, or
     * authenticated with a key that is not in the consensus,
     then the OR is probably the first hop of a circuit (or the second hop of
     a circuit via a bridge relay).

   Circuits may be torn down either completely or hop-by-hop.

To tear down a circuit completely, an OR or OP sends a DESTROY cell to the adjacent nodes on that circuit, using the appropriate direction's circID.

Upon receiving an outgoing DESTROY cell, an OR frees resources associated with the corresponding circuit. If it's not the end of the circuit, it sends a DESTROY cell for that circuit to the next OR in the circuit. If the node is the end of the circuit, then it tears down any associated edge connections (see section 6.1).

After a DESTROY cell has been processed, an OR ignores all data or destroy cells for the corresponding circuit.

To tear down part of a circuit, the OP may send a RELAY_TRUNCATE cell signaling a given OR (Stream ID zero). That OR sends a DESTROY cell to the next node in the circuit, and replies to the OP with a RELAY_TRUNCATED cell.

[Note: If an OR receives a TRUNCATE cell and it has any RELAY cells still queued on the circuit for the next node it will drop them without sending them. This is not considered conformant behavior, but it probably won't get fixed until a later version of Tor. Thus, clients SHOULD NOT send a TRUNCATE cell to a node running any current version of Tor if a) they have sent relay cells through that node, and b) they aren't sure whether those cells have been sent on yet.]

   When an unrecoverable error occurs along one a circuit, the nodes
   must report it as follows:
     * If possible, send a DESTROY cell to ORs _away_ from the client.
     * If possible, send *either* a DESTROY cell towards the client, or
       a RELAY_TRUNCATED cell towards the client.

Current versions of Tor do not reuse truncated RELAY_TRUNCATED circuits: An OP, upon receiving a RELAY_TRUNCATED, will send forward a DESTROY cell in order to entirely tear down the circuit. Because of this, we recommend that relays should send DESTROY towards the client, not RELAY_TRUNCATED.

   NOTE:
     In tor versions before 0.4.5.13, 0.4.6.11 and 0.4.7.9, relays would
     handle an inbound DESTROY by sending the client a RELAY_TRUNCATED
     message.  Beginning with those versions, relays now propagate
     DESTROY cells in either direction, in order to tell every
     intermediary ORs to stop queuing data on the circuit.  The earlier
     behavior created queuing pressure on the intermediary ORs.

The payload of a DESTROY and RELAY_TRUNCATED cell contains a single octet, describing the reason that the circuit was closed. RELAY_TRUNCATED cells, and DESTROY cells sent _towards the client, should contain the actual reason from the list of error codes below. Reasons in DESTROY cell SHOULD NOT be propagated downward or upward, due to potential side channel risk: An OR receiving a DESTROY command should use the DESTROYED reason for its next cell. An OP should always use the NONE reason for its own DESTROY cells.

The error codes are:

     0 -- NONE            (No reason given.)
     1 -- PROTOCOL        (Tor protocol violation.)
     2 -- INTERNAL        (Internal error.)
     3 -- REQUESTED       (A client sent a TRUNCATE command.)
     4 -- HIBERNATING     (Not currently operating; trying to save bandwidth.)
     5 -- RESOURCELIMIT   (Out of memory, sockets, or circuit IDs.)
     6 -- CONNECTFAILED   (Unable to reach relay.)
     7 -- OR_IDENTITY     (Connected to relay, but its OR identity was not
                           as expected.)
     8 -- CHANNEL_CLOSED  (The OR connection that was carrying this circuit
                           died.)
     9 -- FINISHED        (The circuit has expired for being dirty or old.)
    10 -- TIMEOUT         (Circuit construction took too long)
    11 -- DESTROYED       (The circuit was destroyed w/o client TRUNCATE)
    12 -- NOSUCHSERVICE   (Request for unknown hidden service)

Routing relay cells

Circuit ID Checks

When a node wants to send a RELAY or RELAY_EARLY cell, it checks the cell's circID and determines whether the corresponding circuit along that connection is still open. If not, the node drops the cell.

When a node receives a RELAY or RELAY_EARLY cell, it checks the cell's circID and determines whether it has a corresponding circuit along that connection. If not, the node drops the cell.

Forward Direction

The forward direction is the direction that CREATE/CREATE2 cells are sent.

Routing from the Origin

When a relay cell is sent from an OP, the OP encrypts the payload with the stream cipher as follows:

      OP sends relay cell:
         For I=N...1, where N is the destination node:
            Encrypt with Kf_I.
         Transmit the encrypted cell to node 1.

Relaying Forward at Onion Routers

When a forward relay cell is received by an OR, it decrypts the payload with the stream cipher, as follows:

      'Forward' relay cell:
         Use Kf as key; decrypt.

The OR then decides whether it recognizes the relay cell, by inspecting the payload as described in section 6.1 below. If the OR recognizes the cell, it processes the contents of the relay cell. Otherwise, it passes the decrypted relay cell along the circuit if the circuit continues. If the OR at the end of the circuit encounters an unrecognized relay cell, an error has occurred: the OR sends a DESTROY cell to tear down the circuit.

For more information, see section 6 below.

Backward Direction

The backward direction is the opposite direction from CREATE/CREATE2 cells.

Relaying Backward at Onion Routers

When a backward relay cell is received by an OR, it encrypts the payload with the stream cipher, as follows:

      'Backward' relay cell:
         Use Kb as key; encrypt.

Routing to the Origin

When a relay cell arrives at an OP, the OP decrypts the payload with the stream cipher as follows:

         OP receives relay cell from node 1:
            For I=1...N, where N is the final node on the circuit:
                Decrypt with Kb_I.
                If the payload is recognized (see section 6.1), then:
                    The sending node is I.
                    Stop and process the payload.

Handling relay_early cells

A RELAY_EARLY cell is designed to limit the length any circuit can reach. When an OR receives a RELAY_EARLY cell, and the next node in the circuit is speaking v2 of the link protocol or later, the OR relays the cell as a RELAY_EARLY cell. Otherwise, older Tors will relay it as a RELAY cell.

If a node ever receives more than 8 RELAY_EARLY cells on a given outbound circuit, it SHOULD close the circuit. If it receives any inbound RELAY_EARLY cells, it MUST close the circuit immediately.

When speaking v2 of the link protocol or later, clients MUST only send EXTEND/EXTEND2 cells inside RELAY_EARLY cells. Clients SHOULD send the first ~8 RELAY cells that are not targeted at the first hop of any circuit as RELAY_EARLY cells too, in order to partially conceal the circuit length.

[Starting with Tor 0.2.3.11-alpha, relays should reject any EXTEND/EXTEND2 cell not received in a RELAY_EARLY cell.]

Application connections and stream management

Relay cells

Within a circuit, the OP and the end node use the contents of RELAY packets to tunnel end-to-end commands and TCP connections ("Streams") across circuits. End-to-end commands can be initiated by either edge; streams are initiated by the OP.

End nodes that accept streams may be:

exit relays (RELAY_BEGIN, anonymous),
directory servers (RELAY_BEGIN_DIR, anonymous or non-anonymous),
onion services (RELAY_BEGIN, anonymous via a rendezvous point).

The payload of each unencrypted RELAY cell consists of:

         Relay command           [1 byte]
         'Recognized'            [2 bytes]
         StreamID                [2 bytes]
         Digest                  [4 bytes]
         Length                  [2 bytes]
         Data                    [Length bytes]
         Padding                 [PAYLOAD_LEN - 11 - Length bytes]

   The relay commands are:

         1 -- RELAY_BEGIN     [forward]
         2 -- RELAY_DATA      [forward or backward]
         3 -- RELAY_END       [forward or backward]
         4 -- RELAY_CONNECTED [backward]
         5 -- RELAY_SENDME    [forward or backward] [sometimes control]
         6 -- RELAY_EXTEND    [forward]             [control]
         7 -- RELAY_EXTENDED  [backward]            [control]
         8 -- RELAY_TRUNCATE  [forward]             [control]
         9 -- RELAY_TRUNCATED [backward]            [control]
        10 -- RELAY_DROP      [forward or backward] [control]
        11 -- RELAY_RESOLVE   [forward]
        12 -- RELAY_RESOLVED  [backward]
        13 -- RELAY_BEGIN_DIR [forward]
        14 -- RELAY_EXTEND2   [forward]             [control]
        15 -- RELAY_EXTENDED2 [backward]            [control]

        16..18 -- Reserved for UDP; Not yet in use, see prop339.

        19..22 -- Reserved for Conflux, see prop329.

        32..40 -- Used for hidden services; see rend-spec-{v2,v3}.txt.

        41..42 -- Used for circuit padding; see Section 3 of padding-spec.txt.

        Used for flow control; see Section 4 of prop324.
        43 -- XON             [forward or backward]
        44 -- XOFF            [forward or backward]

Commands labelled as "forward" must only be sent by the originator of the circuit. Commands labelled as "backward" must only be sent by other nodes in the circuit back to the originator. Commands marked as either can be sent either by the originator or other nodes.

The 'recognized' field is used as a simple indication that the cell is still encrypted. It is an optimization to avoid calculating expensive digests for every cell. When sending cells, the unencrypted 'recognized' MUST be set to zero.

When receiving and decrypting cells the 'recognized' will always be zero if we're the endpoint that the cell is destined for. For cells that we should relay, the 'recognized' field will usually be nonzero, but will accidentally be zero with P=2^-16.

When handling a relay cell, if the 'recognized' in field in a decrypted relay payload is zero, the 'digest' field is computed as the first four bytes of the running digest of all the bytes that have been destined for this hop of the circuit or originated from this hop of the circuit, seeded from Df or Db respectively (obtained in section 5.2 above), and including this RELAY cell's entire payload (taken with the digest field set to zero). Note that these digests do include the padding bytes at the end of the cell, not only those up to "Len". If the digest is correct, the cell is considered "recognized" for the purposes of decryption (see section 5.5 above).

(The digest does not include any bytes from relay cells that do not start or end at this hop of the circuit. That is, it does not include forwarded data. Therefore if 'recognized' is zero but the digest does not match, the running digest at that node should not be updated, and the cell should be forwarded on.)

All RELAY cells pertaining to the same tunneled stream have the same stream ID. StreamIDs are chosen arbitrarily by the OP. No stream may have a StreamID of zero. Rather, RELAY cells that affect the entire circuit rather than a particular stream use a StreamID of zero -- they are marked in the table above as "[control]" style cells. (Sendme cells are marked as "sometimes control" because they can include a StreamID or not depending on their purpose -- see Section 7.)

The 'Length' field of a relay cell contains the number of bytes in the relay payload which contain real payload data. The remainder of the unencrypted payload is padded with padding bytes. Implementations handle padding bytes of unencrypted relay cells as they do padding bytes for other cell types; see Section 3.

The 'Padding' field is used to make relay cell contents unpredictable, to avoid certain attacks (see proposal 289 for rationale). Implementations SHOULD fill this field with four zero-valued bytes, followed by as many random bytes as will fit. (If there are fewer than 4 bytes for padding, then they should all be filled with zero.

Implementations MUST NOT rely on the contents of the 'Padding' field.

If the RELAY cell is recognized but the relay command is not understood, the cell must be dropped and ignored. Its contents still count with respect to the digests and flow control windows, though.

Calculating the 'Digest' field

The 'Digest' field itself serves the purpose to check if a cell has been fully decrypted, that is, all onion layers have been removed. Having a single field, namely 'Recognized' is not sufficient, as outlined above.

When ENCRYPTING a RELAY cell, an implementation does the following:

     # Encode the cell in binary (recognized and digest set to zero)
     tmp = cmd + [0, 0] + stream_id + [0, 0, 0, 0] + length + data + padding

     # Update the digest with the encoded data
     digest_state = hash_update(digest_state, tmp)
     digest = hash_calculate(digest_state)

     # The encoded data is the same as above with the digest field not being
     # zero anymore
     encoded = cmd + [0, 0] + stream_id + digest[0..4] + length + data +
               padding

     # Now we can encrypt the cell by adding the onion layers ...

   When DECRYPTING a RELAY cell, an implementation does the following:

     decrypted = decrypt(cell)

     # Replace the digest field in decrypted by zeros
     tmp = decrypted[0..5] + [0, 0, 0, 0] + decrypted[9..]

     # Update the digest field with the decrypted data and its digest field
     # set to zero
     digest_state = hash_update(digest_state, tmp)
     digest = hash_calculate(digest_state)

     if digest[0..4] == decrypted[5..9]
       # The cell has been fully decrypted ...

The caveat itself is that only the binary data with the digest bytes set to zero are being taken into account when calculating the running digest. The final plain-text cells (with the digest field set to its actual value) are not taken into the running digest.

Opening streams and transferring data

To open a new anonymized TCP connection, the OP chooses an open circuit to an exit that may be able to connect to the destination address, selects an arbitrary StreamID not yet used on that circuit, and constructs a RELAY_BEGIN cell with a payload encoding the address and port of the destination host. The payload format is:

         ADDRPORT [nul-terminated string]
         FLAGS    [4 bytes]

   ADDRPORT is made of ADDRESS | ':' | PORT | [00]

where ADDRESS can be a DNS hostname, or an IPv4 address in dotted-quad format, or an IPv6 address surrounded by square brackets; and where PORT is a decimal integer between 1 and 65535, inclusive.

The ADDRPORT string SHOULD be sent in lower case, to avoid fingerprinting. Implementations MUST accept strings in any case.

The FLAGS value has one or more of the following bits set, where "bit 1" is the LSB of the 32-bit value, and "bit 32" is the MSB. (Remember that all values in Tor are big-endian (see 0.1.1 above), so the MSB of a 4-byte value is the MSB of the first byte, and the LSB of a 4-byte value is the LSB of its last byte.)

     bit   meaning
      1 -- IPv6 okay.  We support learning about IPv6 addresses and
           connecting to IPv6 addresses.
      2 -- IPv4 not okay.  We don't want to learn about IPv4 addresses
           or connect to them.
      3 -- IPv6 preferred.  If there are both IPv4 and IPv6 addresses,
           we want to connect to the IPv6 one.  (By default, we connect
           to the IPv4 address.)
      4..32 -- Reserved. Current clients MUST NOT set these. Servers
           MUST ignore them.

Upon receiving this cell, the exit node resolves the address as necessary, and opens a new TCP connection to the target port. If the address cannot be resolved, or a connection can't be established, the exit node replies with a RELAY_END cell. (See 6.3 below.) Otherwise, the exit node replies with a RELAY_CONNECTED cell, whose payload is in one of the following formats:

       The IPv4 address to which the connection was made [4 octets]
       A number of seconds (TTL) for which the address may be cached [4 octets]

    or

       Four zero-valued octets [4 octets]
       An address type (6)     [1 octet]
       The IPv6 address to which the connection was made [16 octets]
       A number of seconds (TTL) for which the address may be cached [4 octets]

[Tor exit nodes before 0.1.2.0 set the TTL field to a fixed value. Later versions set the TTL to the last value seen from a DNS server, and expire their own cached entries after a fixed interval. This prevents certain attacks.]

Once a connection has been established, the OP and exit node package stream data in RELAY_DATA cells, and upon receiving such cells, echo their contents to the corresponding TCP stream.

If the exit node does not support optimistic data (i.e. its version number is before 0.2.3.1-alpha), then the OP MUST wait for a RELAY_CONNECTED cell before sending any data. If the exit node supports optimistic data (i.e. its version number is 0.2.3.1-alpha or later), then the OP MAY send RELAY_DATA cells immediately after sending the RELAY_BEGIN cell (and before receiving either a RELAY_CONNECTED or RELAY_END cell).

RELAY_DATA cells sent to unrecognized streams are dropped. If the exit node supports optimistic data, then RELAY_DATA cells it receives on streams which have seen RELAY_BEGIN but have not yet been replied to with a RELAY_CONNECTED or RELAY_END are queued. If the stream creation succeeds with a RELAY_CONNECTED, the queue is processed immediately afterwards; if the stream creation fails with a RELAY_END, the contents of the queue are deleted.

Relay RELAY_DROP cells are long-range dummies; upon receiving such a cell, the OR or OP must drop it.

Opening a directory stream

If a Tor relay is a directory server, it should respond to a RELAY_BEGIN_DIR cell as if it had received a BEGIN cell requesting a connection to its directory port. RELAY_BEGIN_DIR cells ignore exit policy, since the stream is local to the Tor process.

Directory servers may be:

authoritative directories (RELAY_BEGIN_DIR, usually non-anonymous),
bridge authoritative directories (RELAY_BEGIN_DIR, anonymous),
directory mirrors (RELAY_BEGIN_DIR, usually non-anonymous),
onion service directories (RELAY_BEGIN_DIR, anonymous).

If the Tor relay is not running a directory service, it should respond with a REASON_NOTDIRECTORY RELAY_END cell.

Clients MUST generate an all-zero payload for RELAY_BEGIN_DIR cells, and relays MUST ignore the payload.

In response to a RELAY_BEGIN_DIR cell, relays respond either with a RELAY_CONNECTED cell on success, or a RELAY_END cell on failure. They MUST send a RELAY_CONNECTED cell all-zero payload, and clients MUST ignore the payload.

[RELAY_BEGIN_DIR was not supported before Tor 0.1.2.2-alpha; clients SHOULD NOT send it to routers running earlier versions of Tor.]

Closing streams

When an anonymized TCP connection is closed, or an edge node encounters error on any stream, it sends a 'RELAY_END' cell along the circuit (if possible) and closes the TCP connection immediately. If an edge node receives a 'RELAY_END' cell for any stream, it closes the TCP connection completely, and sends nothing more along the circuit for that stream.

The payload of a RELAY_END cell begins with a single 'reason' byte to describe why the stream is closing. For some reasons, it contains additional data (depending on the reason.) The values are:

       1 -- REASON_MISC           (catch-all for unlisted reasons)
       2 -- REASON_RESOLVEFAILED  (couldn't look up hostname)
       3 -- REASON_CONNECTREFUSED (remote host refused connection) [*]
       4 -- REASON_EXITPOLICY     (OR refuses to connect to host or port)
       5 -- REASON_DESTROY        (Circuit is being destroyed)
       6 -- REASON_DONE           (Anonymized TCP connection was closed)
       7 -- REASON_TIMEOUT        (Connection timed out, or OR timed out
                                   while connecting)
       8 -- REASON_NOROUTE        (Routing error while attempting to
                                   contact destination)
       9 -- REASON_HIBERNATING    (OR is temporarily hibernating)
      10 -- REASON_INTERNAL       (Internal error at the OR)
      11 -- REASON_RESOURCELIMIT  (OR has no resources to fulfill request)
      12 -- REASON_CONNRESET      (Connection was unexpectedly reset)
      13 -- REASON_TORPROTOCOL    (Sent when closing connection because of
                                   Tor protocol violations.)
      14 -- REASON_NOTDIRECTORY   (Client sent RELAY_BEGIN_DIR to a
                                   non-directory relay.)

   [*] Older versions of Tor also send this reason when connections are
       reset.

OPs and ORs MUST accept reasons not on the above list, since future versions of Tor may provide more fine-grained reasons.

For most reasons, the format of RELAY_END is:

Reason [1 byte]

For REASON_EXITPOLICY, the format of RELAY_END is:

      Reason                      [1 byte]
      IPv4 or IPv6 address        [4 bytes or 16 bytes]
      TTL                         [4 bytes]

(If the TTL is absent, it should be treated as if it were 0xffffffff. If the address is absent or is the wrong length, the RELAY_END message should be processed anyway.)

Tors SHOULD NOT send any reason except REASON_MISC for a stream that they have originated.

Implementations SHOULD accept empty RELAY_END messages, and treat them as if they specified REASON_MISC.

Upon receiving a RELAY_END cell, the recipient may be sure that no further cells will arrive on that stream, and can treat such cells as a protocol violation.

After sending a RELAY_END cell, the sender needs to give the recipient time to receive that cell. In the meantime, the sender SHOULD remember how many cells of which types (CONNECTED, SENDME, DATA) that it would have accepted on that stream, and SHOULD kill the circuit if it receives more than permitted.

--- [The rest of this section describes unimplemented functionality.]

Because TCP connections can be half-open, we follow an equivalent to TCP's FIN/FIN-ACK/ACK protocol to close streams.

An exit (or onion service) connection can have a TCP stream in one of three states: 'OPEN', 'DONE_PACKAGING', and 'DONE_DELIVERING'. For the purposes of modeling transitions, we treat 'CLOSED' as a fourth state, although connections in this state are not, in fact, tracked by the onion router.

A stream begins in the 'OPEN' state. Upon receiving a 'FIN' from the corresponding TCP connection, the edge node sends a 'RELAY_FIN' cell along the circuit and changes its state to 'DONE_PACKAGING'. Upon receiving a 'RELAY_FIN' cell, an edge node sends a 'FIN' to the corresponding TCP connection (e.g., by calling shutdown(SHUT_WR)) and changing its state to 'DONE_DELIVERING'.

When a stream in already in 'DONE_DELIVERING' receives a 'FIN', it also sends a 'RELAY_FIN' along the circuit, and changes its state to 'CLOSED'. When a stream already in 'DONE_PACKAGING' receives a 'RELAY_FIN' cell, it sends a 'FIN' and changes its state to 'CLOSED'.

If an edge node encounters an error on any stream, it sends a 'RELAY_END' cell (if possible) and closes the stream immediately.

Remote hostname lookup

To find the address associated with a hostname, the OP sends a RELAY_RESOLVE cell containing the hostname to be resolved with a NUL terminating byte. (For a reverse lookup, the OP sends a RELAY_RESOLVE cell containing an in-addr.arpa address.) The OR replies with a RELAY_RESOLVED cell containing any number of answers. Each answer is of the form:

       Type   (1 octet)
       Length (1 octet)
       Value  (variable-width)
       TTL    (4 octets)
   "Length" is the length of the Value field.
   "Type" is one of:

      0x00 -- Hostname
      0x04 -- IPv4 address
      0x06 -- IPv6 address
      0xF0 -- Error, transient
      0xF1 -- Error, nontransient

    If any answer has a type of 'Error', then no other answer may be
    given.

    The 'Value' field encodes the answer:
        IP addresses are given in network order.
        Hostnames are given in standard DNS order ("www.example.com")
          and not NUL-terminated.
        The content of Errors is currently ignored. Relays currently
          set it to the string "Error resolving hostname" with no
          terminating NUL. Implementations MUST ignore this value.

    For backward compatibility, if there are any IPv4 answers, one of those
    must be given as the first answer.

    The RELAY_RESOLVE cell must use a nonzero, distinct streamID; the
    corresponding RELAY_RESOLVED cell must use the same streamID.  No stream
    is actually created by the OR when resolving the name.

Flow control

Link throttling

Each client or relay should do appropriate bandwidth throttling to keep its user happy.

Communicants rely on TCP's default flow control to push back when they stop reading.

The mainline Tor implementation uses token buckets (one for reads, one for writes) for the rate limiting.

Since 0.2.0.x, Tor has let the user specify an additional pair of token buckets for "relayed" traffic, so people can deploy a Tor relay with strict rate limiting, but also use the same Tor as a client. To avoid partitioning concerns we combine both classes of traffic over a given OR connection, and keep track of the last time we read or wrote a high-priority (non-relayed) cell. If it's been less than N seconds (currently N=30), we give the whole connection high priority, else we give the whole connection low priority. We also give low priority to reads and writes for connections that are serving directory information. See proposal 111 for details.

Link padding

Link padding can be created by sending PADDING or VPADDING cells along the connection; relay cells of type "DROP" can be used for long-range padding. The payloads of PADDING, VPADDING, or DROP cells are filled with padding bytes. See Section 3.

If the link protocol is version 5 or higher, link level padding is enabled as per padding-spec.txt. On these connections, clients may negotiate the use of padding with a CELL_PADDING_NEGOTIATE command whose format is as follows:

         Version           [1 byte]
         Command           [1 byte]
         ito_low_ms        [2 bytes]
         ito_high_ms       [2 bytes]

Currently, only version 0 of this cell is defined. In it, the command field is either 1 (stop padding) or 2 (start padding). For the start padding command, a pair of timeout values specifying a low and a high range bounds for randomized padding timeouts may be specified as unsigned integer values in milliseconds. The ito_low_ms field should not be lower than the current consensus parameter value for nf_ito_low (default: 1500). The ito_high_ms field should not be lower than ito_low_ms. (If any party receives an out-of-range value, they clamp it so that it is in-range.)

For the stop padding command, the timeout fields should be sent as zero (to avoid client distinguishability) and ignored by the recipient.

For more details on padding behavior, see padding-spec.txt.

Circuit-level flow control

To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many RELAY_DATA cells it is allowed to originate or willing to consume.

These two windows are respectively named: the package window (packaged for transmission) and the deliver window (delivered for local streams).

Because of our leaky-pipe topology, every relay on the circuit has a pair of windows, and the OP has a pair of windows for every relay on the circuit. These windows do not apply to relayed cells, however, and a relay that is never used for streams will never decrement its window or cause the client to decrement a window.

Each 'window' value is initially set based on the consensus parameter 'circwindow' in the directory (see dir-spec.txt), or to 1000 data cells if no 'circwindow' value is given. In each direction, cells that are not RELAY_DATA cells do not affect the window.

An OR or OP (depending on the stream direction) sends a RELAY_SENDME cell to indicate that it is willing to receive more cells when its deliver window goes down below a full increment (100). For example, if the window started at 1000, it should send a RELAY_SENDME when it reaches 900.

When an OR or OP receives a RELAY_SENDME, it increments its package window by a value of 100 (circuit window increment) and proceeds to sending the remaining RELAY_DATA cells.

If a package window reaches 0, the OR or OP stops reading from TCP connections for all streams on the corresponding circuit, and sends no more RELAY_DATA cells until receiving a RELAY_SENDME cell.

If a deliver window goes below 0, the circuit should be torn down.

Starting with tor-0.4.1.1-alpha, authenticated SENDMEs are supported (version 1, see below). This means that both the OR and OP need to remember the rolling digest of the cell that precedes (triggers) a RELAY_SENDME. This can be known if the package window gets to a multiple of the circuit window increment (100).

When the RELAY_SENDME version 1 arrives, it will contain a digest that MUST match the one remembered. This represents a proof that the end point of the circuit saw the sent cells. On failure to match, the circuit should be torn down.

To ensure unpredictability, random bytes should be added to at least one RELAY_DATA cell within one increment window. In other word, every 100 cells (increment), random bytes should be introduced in at least one cell.

SENDME Cell Format

A circuit-level RELAY_SENDME cell always has its StreamID=0.

An OR or OP must obey these two consensus parameters in order to know which version to emit and accept.

      'sendme_emit_min_version': Minimum version to emit.
      'sendme_accept_min_version': Minimum version to accept.

If a RELAY_SENDME version is received that is below the minimum accepted version, the circuit should be closed.

The RELAY_SENDME payload contains the following:

      VERSION     [1 byte]
      DATA_LEN    [2 bytes]
      DATA        [DATA_LEN bytes]

The VERSION tells us what is expected in the DATA section of length DATA_LEN and how to handle it. The recognized values are:

0x00: The rest of the payload should be ignored.

0x01: Authenticated SENDME. The DATA section MUST contain:

DIGEST [20 bytes]

         If the DATA_LEN value is less than 20 bytes, the cell should be
         dropped and the circuit closed. If the value is more than 20 bytes,
         then the first 20 bytes should be read to get the DIGEST value.

         The DIGEST is the rolling digest value from the RELAY_DATA cell that
         immediately preceded (triggered) this RELAY_SENDME. This value is
         matched on the other side from the previous cell sent that the OR/OP
         must remember.

         (Note that if the digest in use has an output length greater than 20
         bytes—as is the case for the hop of an onion service rendezvous
         circuit created by the hs_ntor handshake—we truncate the digest
         to 20 bytes here.)

If the VERSION is unrecognized or below the minimum accepted version (taken from the consensus), the circuit should be torn down.

Stream-level flow control

Edge nodes use RELAY_SENDME cells to implement end-to-end flow control for individual connections across circuits. Similarly to circuit-level flow control, edge nodes begin with a window of cells (500) per stream, and increment the window by a fixed value (50) upon receiving a RELAY_SENDME cell. Edge nodes initiate RELAY_SENDME cells when both a) the window is <= 450, and b) there are less than ten cell payloads remaining to be flushed at that edge.

Stream-level RELAY_SENDME cells are distinguished by having nonzero StreamID. They are still empty; the body still SHOULD be ignored.

Handling resource exhaustion

Memory exhaustion.

(See also dos-spec.md.)

If RAM becomes low, an OR should begin destroying circuits until more memory is free again. We recommend the following algorithm:

Set a threshold amount of RAM to recover at 10% of the total RAM.

     - Sort the circuits by their 'staleness', defined as the age of the
       oldest data queued on the circuit.  This data can be:

          * Bytes that are waiting to flush to or from a stream on that
            circuit.

          * Bytes that are waiting to flush from a connection created with
            BEGIN_DIR.

          * Cells that are waiting to flush or be processed.

     - While we have not yet recovered enough RAM:

          * Free all memory held by the most stale circuit, and send DESTROY
            cells in both directions on that circuit.  Count the amount of
            memory we recovered towards the total.

Subprotocol versioning

This section specifies the Tor subprotocol versioning. They are broken down into different types with their current version numbers. Any new version number should be added to this section.

The dir-spec.txt details how those versions are encoded. See the "proto"/"pr" line in a descriptor and the "recommended-relay-protocols", "required-relay-protocols", "recommended-client-protocols" and "required-client-protocols" lines in the vote/consensus format.

Here are the rules a relay and client should follow when encountering a protocol list in the consensus:

      - When a relay lacks a protocol listed in recommended-relay-protocols,
        it should warn its operator that the relay is obsolete.

      - When a relay lacks a protocol listed in required-relay-protocols, it
        should warn its operator as above. If the consensus is newer than the
        date when the software was released or scheduled for release, it must
        not attempt to join the network.

      - When a client lacks a protocol listed in recommended-client-protocols,
        it should warn the user that the client is obsolete.

      - When a client lacks a protocol listed in required-client-protocols,
        it should warn the user as above.  If the consensus is newer than the
        date when the software was released, it must not connect to the
        network.  This implements a "safe forward shutdown" mechanism for
        zombie clients.

      - If a client or relay has a cached consensus telling it that a given
        protocol is required, and it does not implement that protocol, it
        SHOULD NOT try to fetch a newer consensus.

Software release dates SHOULD be automatically updated as part of the release process, to prevent forgetting to move them forward. Software release dates MAY be manually adjusted by maintainers if necessary.

Starting in version 0.2.9.4-alpha, the initial required protocols for clients that we will Recommend and Require are:

      Cons=1-2 Desc=1-2 DirCache=1 HSDir=1 HSIntro=3 HSRend=1 Link=4
      LinkAuth=1 Microdesc=1-2 Relay=2

   For relays we will Require:

      Cons=1 Desc=1 DirCache=1 HSDir=1 HSIntro=3 HSRend=1 Link=3-4
      LinkAuth=1 Microdesc=1 Relay=1-2

For relays, we will additionally Recommend all protocols which we recommend for clients.

"Link"

The "link" protocols are those used by clients and relays to initiate and receive OR connections and to handle cells on OR connections. The "link" protocol versions correspond 1:1 to those versions.

Two Tor instances can make a connection to each other only if they have at least one link protocol in common.

The current "link" versions are: "1" through "5". See section 4.1 for more information. All current Tor versions support "1-3"; versions from 0.2.4.11-alpha and on support "1-4"; versions from 0.3.1.1-alpha and on support "1-5". Eventually we will drop "1" and "2".

"LinkAuth"

LinkAuth protocols correspond to varieties of Authenticate cells used for the v3+ link protocols.

Current versions are:

"1" is the RSA link authentication described in section 4.4.1 above.

"2" is unused, and reserved by proposal 244.

"3" is the ed25519 link authentication described in 4.4.2 above.

"Relay"

The "relay" protocols are those used to handle CREATE/CREATE2 cells, and those that handle the various RELAY cell types received after a CREATE/CREATE2 cell. (Except, relay cells used to manage introduction and rendezvous points are managed with the "HSIntro" and "HSRend" protocols respectively.)

Current versions are:

   "1" -- supports the TAP key exchange, with all features in Tor 0.2.3.
          Support for CREATE and CREATED and CREATE_FAST and CREATED_FAST
          and EXTEND and EXTENDED.

   "2" -- supports the ntor key exchange, and all features in Tor
          0.2.4.19.  Includes support for CREATE2 and CREATED2 and
          EXTEND2 and EXTENDED2.

          Relay=2 has limited IPv6 support:
            * Clients might not include IPv6 ORPorts in EXTEND2 cells.
            * Relays (and bridges) might not initiate IPv6 connections in
              response to EXTEND2 cells containing IPv6 ORPorts, even if they
              are configured with an IPv6 ORPort.

          However, relays support accepting inbound connections to their IPv6
          ORPorts. And they might extend circuits via authenticated IPv6
          connections to other relays.

   "3" -- relays support extending over IPv6 connections in response to an
          EXTEND2 cell containing an IPv6 ORPort.

          Bridges might not extend over IPv6, because they try to imitate
          client behaviour.

          A successful IPv6 extend requires:
            * Relay subprotocol version 3 (or later) on the extending relay,
            * an IPv6 ORPort on the extending relay,
            * an IPv6 ORPort for the accepting relay in the EXTEND2 cell, and
            * an IPv6 ORPort on the accepting relay.
          (Because different tor instances can have different views of the
          network, these checks should be done when the path is selected.
          Extending relays should only check local IPv6 information, before
          attempting the extend.)

          When relays receive an EXTEND2 cell containing both an IPv4 and an
          IPv6 ORPort, and there is no existing authenticated connection with
          the target relay, the extending relay may choose between IPv4 and
          IPv6 at random. The extending relay might not try the other address,
          if the first connection fails.

          As is the case with other subprotocol versions, tor advertises,
          recommends, or requires support for this protocol version, regardless
          of its current configuration.

          In particular:
            * relays without an IPv6 ORPort, and
            * tor instances that are not relays,
          have the following behaviour, regardless of their configuration:
            * advertise support for "Relay=3" in their descriptor
              (if they are a relay, bridge, or directory authority), and
            * react to consensuses recommending or requiring support for
              "Relay=3".

          This subprotocol version is described in proposal 311, and
          implemented in Tor 0.4.5.1-alpha.

   "4" -- support the ntorv3 (version 3) key exchange and all features in
          0.4.7.3-alpha. This adds a new CREATE2 cell type. See proposal 332
          for more details.

"HSIntro"

The "HSIntro" protocol handles introduction points.

   "3" -- supports authentication as of proposal 121 in Tor
          0.2.1.6-alpha.

   "4" -- support ed25519 authentication keys which is defined by the HS v3
          protocol as part of proposal 224 in Tor 0.3.0.4-alpha.

   "5" -- support ESTABLISH_INTRO cell DoS parameters extension for onion
          service version 3 only in Tor 0.4.2.1-alpha.

"HSRend"

The "HSRend" protocol handles rendezvous points.

"1" -- supports all features in Tor 0.0.6.

   "2" -- supports RENDEZVOUS2 cells of arbitrary length as long as they
          have 20 bytes of cookie in Tor 0.2.9.1-alpha.

"HSDir"

The "HSDir" protocols are the set of hidden service document types that can be uploaded to, understood by, and downloaded from a tor relay, and the set of URLs available to fetch them.

"1" -- supports all features in Tor 0.2.0.10-alpha.

   "2" -- support ed25519 blinded keys request which is defined by the HS v3
          protocol as part of proposal 224 in Tor 0.3.0.4-alpha.

"DirCache"

The "DirCache" protocols are the set of documents available for download from a directory cache via BEGIN_DIR, and the set of URLs available to fetch them. (This excludes URLs for hidden service objects.)

"1" -- supports all features in Tor 0.2.4.19.

"2" -- adds support for consensus diffs in Tor 0.3.1.1-alpha.

"Desc"

Describes features present or absent in descriptors.

Most features in descriptors don't require a "Desc" update -- only those that need to someday be required. For example, someday clients will need to understand ed25519 identities.

"1" -- supports all features in Tor 0.2.4.19.

"2" -- cross-signing with onion-keys, signing with ed25519 identities.

"Microdesc"

Describes features present or absent in microdescriptors.

Most features in descriptors don't require a "MicroDesc" update -- only those that need to someday be required. These correspond more or less with consensus methods.

"1" -- consensus methods 9 through 20.

"2" -- consensus method 21 (adds ed25519 keys to microdescs).

"Cons"

Describes features present or absent in consensus documents.

Most features in consensus documents don't require a "Cons" update -- only those that need to someday be required.

These correspond more or less with consensus methods.

"1" -- consensus methods 9 through 20.

"2" -- consensus method 21 (adds ed25519 keys to microdescs).

"Padding"

Describes the padding capabilities of the relay.

   "1" -- [DEFUNCT] Relay supports circuit-level padding. This version MUST NOT
          be used as it was also enabled in relays that don't actually support
          circuit-level padding. Advertised by Tor versions from
          tor-0.4.0.1-alpha and only up to and including tor-0.4.1.4-rc.

   "2" -- Relay supports the HS circuit setup padding machines (proposal 302).
          Advertised by Tor versions from tor-0.4.1.5 and onwards.

"FlowCtrl"

Describes the flow control protocol at the circuit and stream level. If there is no FlowCtrl advertised, tor supports the unauthenticated flow control features (version 0).

   "1" -- supports authenticated circuit level SENDMEs as of proposal 289 in
          Tor 0.4.1.1-alpha.

   "2" -- supports congestion control by the Exits which implies a new SENDME
          format and algorithm. See proposal 324 for more details. Advertised
          in tor 0.4.7.3-alpha.

"Datagram"

Describes the UDP protocol capabilities of a relay.

   "1" -- [RESERVED] supports UDP by an Exit as in the relay command
          CONNECT_UDP, CONNECTED_UDP and DATAGRAM. See proposal
          339 for more details. (Not yet advertised, reserved)

Ed25519 certificates in Tor

Table of Contents

    1. Scope and Preliminaries
        1.1. Signing
        1.2. Integer encoding
    2. Document formats
        2.1. Ed25519 Certificates
        2.2. Basic extensions
            2.2.1. Signed-with-ed25519-key extension [type 04]
        2.3. RSA->Ed25519 cross-certificate
    A.1. List of certificate types (CERT_TYPE field)
    A.2. List of extension types
    A.3. List of signature prefixes
    A.4. List of certified key types (CERT_KEY_TYPE field)

Scope and Preliminaries

This document describes a certificate format that Tor uses for its Ed25519 internal certificates. It is not the only certificate format that Tor uses. For the certificates that authorities use for their signing keys, see dir-spec.txt. Additionally, Tor uses TLS, which depends on X.509 certificates; see tor-spec.txt for details.

The certificates in this document were first introduced in proposal 220, and were first supported by Tor in Tor version 0.2.7.2-alpha.

Signing

All signatures here, unless otherwise specified, are computed using an Ed25519 key.

In order to future-proof the format, before signing anything, the signed document is prefixed with a personalization string, which will be different in each case.

Integer encoding

Network byte order (big-endian) is used to encode all integer values in Ed25519 certificates unless explicitly specified otherwise.

Document formats

Ed25519 Certificates

When generating a signing key, we also generate a certificate for it. Unlike the certificates for authorities' signing keys, these certificates need to be sent around frequently, in significant numbers. So we'll choose a compact representation.

         VERSION         [1 Byte]
         CERT_TYPE       [1 Byte]
         EXPIRATION_DATE [4 Bytes]
         CERT_KEY_TYPE   [1 byte]
         CERTIFIED_KEY   [32 Bytes]
         N_EXTENSIONS    [1 byte]
         EXTENSIONS      [N_EXTENSIONS times]
         SIGNATURE       [64 Bytes]

The "VERSION" field holds the value [01]. The "CERT_TYPE" field holds a value depending on the type of certificate. (See appendix A.1.) The CERTIFIED_KEY field is an Ed25519 public key if CERT_KEY_TYPE is [01], or a digest of some other key type depending on the value of CERT_KEY_TYPE. (See appendix A.4.) The EXPIRATION_DATE is a date, given in HOURS since the epoch, after which this certificate isn't valid. (A four-byte field here will work fine until 10136 A.D.)

The EXTENSIONS field contains zero or more extensions, each of the format:

         ExtLength [2 bytes]
         ExtType   [1 byte]
         ExtFlags  [1 byte]
         ExtData   [ExtLength bytes]

   The meaning of the ExtData field in an extension is type-dependent.

   The ExtFlags field holds flags; this flag is currently defined:

      1 -- AFFECTS_VALIDATION. If this flag is present, then the
           extension affects whether the certificate is valid; clients
           must not accept the certificate as valid unless they
           understand the extension.

It is an error for an extension to be truncated; such a certificate is invalid.

Before processing any certificate, parties SHOULD know which identity key it is supposed to be signed by, and then check the signature. The signature is created by signing all the fields in the certificate up until "SIGNATURE" (that is, signing sizeof(ed25519_cert) - 64 bytes).

Basic extensions

Signed-with-ed25519-key extension [type 04]

In several places, it's desirable to bundle the key signing a certificate along with the certificate. We do so with this extension.

        ExtLength = 32
        ExtData =
           An ed25519 key    [32 bytes]

When this extension is present, it MUST match the key used to sign the certificate.

RSA->Ed25519 cross-certificate

Certificate type [07] (Cross-certification of Ed25519 identity with RSA key) contains the following data:

       ED25519_KEY                       [32 bytes]
       EXPIRATION_DATE                   [4 bytes]
       SIGLEN                            [1 byte]
       SIGNATURE                         [SIGLEN bytes]

Here, the Ed25519 identity key is signed with router's RSA identity key, to indicate that authenticating with a key certified by the Ed25519 key counts as certifying with RSA identity key. (The signature is computed on the SHA256 hash of the non-signature parts of the certificate, prefixed with the string "Tor TLS RSA/Ed25519 cross-certificate".)

Just like with the Ed25519 certificates above, the EXPIRATION_DATE operates in HOURS after the epoch.

This certificate type is used to mean, "This Ed25519 identity key acts with the authority of the RSA key that signed this certificate."

List of certificate types (CERT_TYPE field)

The values marked with asterisks are not types corresponding to the certificate format of section 2.1. Instead, they are reserved for RSA-signed certificates to avoid conflicts between the certificate type enumeration of the CERTS cell and the certificate type enumeration of in our Ed25519 certificates.

   **[00],[01],[02],[03] - Reserved to avoid conflict with types used
          in CERTS cells.

   [04] - Ed25519 signing key with an identity key
          (see prop220 section 4.2)

   [05] - TLS link certificate signed with ed25519 signing key
          (see prop220 section 4.2)

   [06] - Ed25519 authentication key signed with ed25519 signing key
          (see prop220 section 4.2)

   **[07] - Reserved for RSA identity cross-certification;
          (see section 2.3 above, and tor-spec.txt section 4.2)

   [08] - Onion service: short-term descriptor signing key, signed
          with blinded public key.
          (See rend-spec-v3.txt, section [DESC_OUTER])

   [09] - Onion service: intro point authentication key, cross-certifying the
          descriptor signing key.
          (See rend-spec-v3.txt, description of "auth-key")

   [0A] - ntor onion key cross-certifying ed25519 identity key
          (see dir-spec.txt, description of "ntor-onion-key-crosscert")

   [0B] - Onion service: ntor-extra encryption key, cross-certifying
          descriptor signing key.
          (see rend-spec-v3.txt, description of "enc-key-cert")

List of extension types

[04] - signed-with-ed25519-key (section 2.2.1)

List of signature prefixes

We describe various documents as being signed with a prefix. Here are those prefixes:

"Tor router descriptor signature v1" (see dir-spec.txt)

List of certified key types (CERT_KEY_TYPE field)

   [01] ed25519 key
   [02] SHA256 hash of an RSA key. (Not currently used.)
   [03] SHA256 hash of an X.509 certificate. (Used with certificate
        type 5.)

(NOTE: Up till 0.4.5.1-alpha, all versions of Tor have incorrectly used "01" for all types of certified key. Implementations SHOULD allow "01" in this position, and infer the actual key type from the CERT_TYPE field.)

Tor directory protocol, version 3

Table of Contents

    0. Scope and preliminaries
        0.1. History
        0.2. Goals of the version 3 protoc
        0.3. Some Remaining questions
    1. Outline
        1.1. What's different from version 2?
        1.2. Document meta-format
        1.3. Signing documents
        1.4. Voting timeline
    2. Router operation and formats
        2.1. Uploading server descriptors and extra-info documents
            2.1.1. Server descriptor format
            2.1.2. Extra-info document format
            2.1.3. Nonterminals in server descriptors
    3. Directory authority operation and formats
        3.1. Creating key certificates
        3.2. Accepting server descriptor and extra-info document uploads
        3.3. Computing microdescriptors
        3.4. Exchanging votes
            3.4.1. Vote and consensus status document formats
            3.4.2. Assigning flags in a vote
            3.4.3. Serving bandwidth list files
        3.5. Downloading missing certificates from other directory authorities
        3.6. Downloading server descriptors from other directory authorities
        3.7. Downloading extra-info documents from other directory authorities
        3.8. Computing a consensus from a set of votes
            3.8.0.1. Deciding which Ids to include.
            3.8.0.2. Deciding which descriptors to include
            3.8.1. Forward compatibility
            3.8.2. Encoding port lists
            3.8.3. Computing Bandwidth Weights
        3.9. Computing consensus flavors
            3.9.1. ns consensus
            3.9.2. Microdescriptor consensus
        3.10. Exchanging detached signatures
        3.11. Publishing the signed consensus
    4. Directory cache operation
        4.1. Downloading consensus status documents from directory authorities
        4.2. Downloading server descriptors from directory authorities
        4.3. Downloading microdescriptors from directory authorities
        4.4. Downloading extra-info documents from directory authorities
        4.5. Consensus diffs
            4.5.1. Consensus diff format
            4.5.2. Serving and requesting diff
        4.6 Retrying failed downloads
    5. Client operation
        5.1. Downloading network-status documents
        5.2. Downloading server descriptors or microdescriptors
        5.3. Downloading extra-info documents
        5.4. Using directory information
        5.4.1. Choosing routers for circuits.
        5.4.2. Managing naming
        5.4.3. Software versions
        5.4.4. Warning about a router's status.
        5.5. Retrying failed downloads
    6. Standards compliance
        6.1. HTTP headers
        6.2. HTTP status codes
            A. Consensus-negotiation timeline.
            B. General-use HTTP URLs
            C. Converting a curve25519 public key to an ed25519 public key
            D. Inferring missing proto lines.
            E. Limited ed diff format

Scope and preliminaries

This directory protocol is used by Tor version 0.2.0.x-alpha and later. See dir-spec-v1.txt for information on the protocol used up to the 0.1.0.x series, and dir-spec-v2.txt for information on the protocol used by the 0.1.1.x and 0.1.2.x series.

This document merges and supersedes the following proposals:

101 Voting on the Tor Directory System 103 Splitting identity key from regularly used signing key 104 Long and Short Router Descriptors

XXX timeline XXX fill in XXXXs

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

History

The earliest versions of Onion Routing shipped with a list of known routers and their keys. When the set of routers changed, users needed to fetch a new list.

The Version 1 Directory protocol

Early versions of Tor (0.0.2) introduced "Directory authorities": servers that served signed "directory" documents containing a list of signed "server descriptors", along with short summary of the status of each router. Thus, clients could get up-to-date information on the state of the network automatically, and be certain that the list they were getting was attested by a trusted directory authority.

Later versions (0.0.8) added directory caches, which download directories from the authorities and serve them to clients. Non-caches fetch from the caches in preference to fetching from the authorities, thus distributing bandwidth requirements.

Also added during the version 1 directory protocol were "router status" documents: short documents that listed only the up/down status of the routers on the network, rather than a complete list of all the descriptors. Clients and caches would fetch these documents far more frequently than they would fetch full directories.

The Version 2 Directory Protocol

During the Tor 0.1.1.x series, Tor revised its handling of directory documents in order to address two major problems:

      * Directories had grown quite large (over 1MB), and most directory
        downloads consisted mainly of server descriptors that clients
        already had.

      * Every directory authority was a trust bottleneck: if a single
        directory authority lied, it could make clients believe for a time
        an arbitrarily distorted view of the Tor network.  (Clients
        trusted the most recent signed document they downloaded.) Thus,
        adding more authorities would make the system less secure, not
        more.

To address these, we extended the directory protocol so that authorities now published signed "network status" documents. Each network status listed, for every router in the network: a hash of its identity key, a hash of its most recent descriptor, and a summary of what the authority believed about its status. Clients would download the authorities' network status documents in turn, and believe statements about routers iff they were attested to by more than half of the authorities.

Instead of downloading all server descriptors at once, clients downloaded only the descriptors that they did not have. Descriptors were indexed by their digests, in order to prevent malicious caches from giving different versions of a server descriptor to different clients.

Routers began working harder to upload new descriptors only when their contents were substantially changed.

Goals of the version 3 protocol

Version 3 of the Tor directory protocol tries to solve the following issues:

      * A great deal of bandwidth used to transmit server descriptors was
        used by two fields that are not actually used by Tor routers
        (namely read-history and write-history).  We save about 60% by
        moving them into a separate document that most clients do not
        fetch or use.

      * It was possible under certain perverse circumstances for clients
        to download an unusual set of network status documents, thus
        partitioning themselves from clients who have a more recent and/or
        typical set of documents.  Even under the best of circumstances,
        clients were sensitive to the ages of the network status documents
        they downloaded.  Therefore, instead of having the clients
        correlate multiple network status documents, we have the
        authorities collectively vote on a single consensus network status
        document.

      * The most sensitive data in the entire network (the identity keys
        of the directory authorities) needed to be stored unencrypted so
        that the authorities can sign network-status documents on the fly.
        Now, the authorities' identity keys are stored offline, and used
        to certify medium-term signing keys that can be rotated.

Some Remaining questions

Things we could solve on a v3 timeframe:

The SHA-1 hash is showing its age. We should do something about our dependency on it. We could probably future-proof ourselves here in this revision, at least so far as documents from the authorities are concerned.

Too many things about the authorities are hardcoded by IP.

Perhaps we should start accepting longer identity keys for routers too.

Things to solve eventually:

Requiring every client to know about every router won't scale forever.

Requiring every directory cache to know every router won't scale forever.

Outline

There is a small set (say, around 5-10) of semi-trusted directory authorities. A default list of authorities is shipped with the Tor software. Users can change this list, but are encouraged not to do so, in order to avoid partitioning attacks.

Every authority has a very-secret, long-term "Authority Identity Key". This is stored encrypted and/or offline, and is used to sign "key certificate" documents. Every key certificate contains a medium-term (3-12 months) "authority signing key", that is used by the authority to sign other directory information. (Note that the authority identity key is distinct from the router identity key that the authority uses in its role as an ordinary router.)

Routers periodically upload signed "routers descriptors" to the directory authorities describing their keys, capabilities, and other information. Routers may also upload signed "extra-info documents" containing information that is not required for the Tor protocol. Directory authorities serve server descriptors indexed by router identity, or by hash of the descriptor.

Routers may act as directory caches to reduce load on the directory authorities. They announce this in their descriptors.

Periodically, each directory authority generates a view of the current descriptors and status for known routers. They send a signed summary of this view (a "status vote") to the other authorities. The authorities compute the result of this vote, and sign a "consensus status" document containing the result of the vote.

Directory caches download, cache, and re-serve consensus documents.

Clients, directory caches, and directory authorities all use consensus documents to find out when their list of routers is out-of-date. (Directory authorities also use vote statuses.) If it is, they download any missing server descriptors. Clients download missing descriptors from caches; caches and authorities download from authorities. Descriptors are downloaded by the hash of the descriptor, not by the relay's identity key: this prevents directory servers from attacking clients by giving them descriptors nobody else uses.

All directory information is uploaded and downloaded with HTTP.

What's different from version 2?

Clients used to download multiple network status documents, corresponding roughly to "status votes" above. They would compute the result of the vote on the client side.

Authorities used to sign documents using the same private keys they used for their roles as routers. This forced them to keep these extremely sensitive keys in memory unencrypted.

All of the information in extra-info documents used to be kept in the main descriptors.

Document meta-format

Server descriptors, directories, and running-routers documents all obey the following lightweight extensible information format.

The highest level object is a Document, which consists of one or more Items. Every Item begins with a KeywordLine, followed by zero or more Objects. A KeywordLine begins with a Keyword, optionally followed by whitespace and more non-newline characters, and ends with a newline. A Keyword is a sequence of one or more characters in the set [A-Za-z0-9-], but may not start with -. An Object is a block of encoded data in pseudo-Privacy-Enhanced-Mail (PEM) style format: that is, lines of encoded data MAY be wrapped by inserting an ascii linefeed ("LF", also called newline, or "NL" here) character (cf. RFC 4648 §3.1). When line wrapping, implementations MUST wrap lines at 64 characters. Upon decoding, implementations MUST ignore and discard all linefeed characters.

More formally:

NL = The ascii LF character (hex value 0x0a). Document ::= (Item | NL)+ Item ::= KeywordLine Object? KeywordLine ::= Keyword (WS Argument)* NL Keyword = KeywordStart KeywordChar* KeywordStart ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' KeywordChar ::= KeywordStart | '-' Argument := ArgumentChar+ ArgumentChar ::= any graphical printing ASCII character. WS = (SP | TAB)+ Object ::= BeginLine Base64-encoded-data EndLine BeginLine ::= "-----BEGIN " Keyword (" " Keyword)* "-----" NL EndLine ::= "-----END " Keyword (" " Keyword)* "-----" NL

A Keyword may not be "-----BEGIN".

The BeginLine and EndLine of an Object must use the same keyword.

When interpreting a Document, software MUST ignore any KeywordLine that starts with a keyword it doesn't recognize; future implementations MUST NOT require current clients to understand any KeywordLine not currently described.

Other implementations that want to extend Tor's directory format MAY introduce their own items. The keywords for extension items SHOULD start with the characters "x-" or "X-", to guarantee that they will not conflict with keywords used by future versions of Tor.

In our document descriptions below, we tag Items with a multiplicity in brackets. Possible tags are:

    "At start, exactly once": These items MUST occur in every instance of
      the document type, and MUST appear exactly once, and MUST be the
      first item in their documents.

    "Exactly once": These items MUST occur exactly one time in every
      instance of the document type.

    "At end, exactly once": These items MUST occur in every instance of
      the document type, and MUST appear exactly once, and MUST be the
      last item in their documents.

    "At most once": These items MAY occur zero or one times in any
      instance of the document type, but MUST NOT occur more than once.

    "Any number": These items MAY occur zero, one, or more times in any
      instance of the document type.

    "Once or more": These items MUST occur at least once in any instance
      of the document type, and MAY occur more.

For forward compatibility, each item MUST allow extra arguments at the end of the line unless otherwise noted. So if an item's description below is given as:

"thing" int int int NL

then implementations SHOULD accept this string as well:

"thing 5 9 11 13 16 12" NL

but not this string:

"thing 5" NL

and not this string:

       "thing 5 10 thing" NL
   .

Whenever an item DOES NOT allow extra arguments, we will tag it with "no extra arguments".

Signing documents

Every signable document below is signed in a similar manner, using a given "Initial Item", a final "Signature Item", a digest algorithm, and a signing key.

The Initial Item must be the first item in the document.

The Signature Item has the following format:

[arguments] NL SIGNATURE NL

The "SIGNATURE" Object contains a signature (using the signing key) of the PKCS#1 1.5 padded digest of the entire document, taken from the beginning of the Initial item, through the newline after the Signature Item's keyword and its arguments.

The signature does not include the algorithmIdentifier specified in PKCS #1.

Unless specified otherwise, the digest algorithm is SHA-1.

All documents are invalid unless signed with the correct signing key.

The "Digest" of a document, unless stated otherwise, is its digest as signed by this signature scheme.

Voting timeline

Every consensus document has a "valid-after" (VA) time, a "fresh-until" (FU) time and a "valid-until" (VU) time. VA MUST precede FU, which MUST in turn precede VU. Times are chosen so that every consensus will be "fresh" until the next consensus becomes valid, and "valid" for a while after. At least 3 consensuses should be valid at any given time.

The timeline for a given consensus is as follows:

VA-DistSeconds-VoteSeconds: The authorities exchange votes. Each authority uploads their vote to all other authorities.

VA-DistSeconds-VoteSeconds/2: The authorities try to download any votes they don't have.

Authorities SHOULD also reject any votes that other authorities try to upload after this time. (0.4.4.1-alpha was the first version to reject votes in this way.)

Note: Refusing late uploaded votes minimizes the chance of a consensus split, particular when authorities are under bandwidth pressure. If an authority is struggling to upload its vote, and finally uploads to a fraction of authorities after this period, they will compute a consensus different from the others. By refusing uploaded votes after this time, we increase the likelihood that most authorities will use the same vote set.

Rejecting late uploaded votes does not fix the problem entirely. If some authorities are able to download a specific vote, but others fail to do so, then there may still be a consensus split. However, this change does remove one common cause of consensus splits.

VA-DistSeconds: The authorities calculate the consensus and exchange signatures. (This is the earliest point at which anybody can possibly get a given consensus if they ask for it.)

VA-DistSeconds/2: The authorities try to download any signatures they don't have.

VA: All authorities have a multiply signed consensus.

   VA ... FU: Caches download the consensus.  (Note that since caches have
        no way of telling what VA and FU are until they have downloaded
        the consensus, they assume that the present consensus's VA is
        equal to the previous one's FU, and that its FU is one interval after
        that.)

   FU: The consensus is no longer the freshest consensus.

   FU ... (the current consensus's VU): Clients download the consensus.
        (See note above: clients guess that the next consensus's FU will be
        two intervals after the current VA.)

VU: The consensus is no longer valid; clients should continue to try to download a new consensus if they have not done so already.

VU + 24 hours: Clients will no longer use the consensus at all.

VoteSeconds and DistSeconds MUST each be at least 20 seconds; FU-VA and VU-FU MUST each be at least 5 minutes.

Router operation and formats

Uploading server descriptors and extra-info documents

ORs SHOULD generate a new server descriptor and a new extra-info document whenever any of the following events have occurred:

      - A period of time (18 hrs by default) has passed since the last
        time a descriptor was generated.

      - A descriptor field other than bandwidth or uptime has changed.

      - Its uptime is less than 24h and bandwidth has changed by a factor of 2
        from the last time a descriptor was generated, and at least a given
        interval of time (3 hours by default) has passed since then.

      - Its uptime has been reset (by restarting).

      - It receives a networkstatus consensus in which it is not listed.

      - It receives a networkstatus consensus in which it is listed
        with the StaleDesc flag.

      [XXX this list is incomplete; see router_differences_are_cosmetic()
       in routerlist.c for others]

ORs SHOULD NOT publish a new server descriptor or extra-info document if none of the above events have occurred and not much time has passed (12 hours by default).

Tor versions older than 0.3.5.1-alpha ignore uptime when checking for bandwidth changes.

After generating a descriptor, ORs upload them to every directory authority they know, by posting them (in order) to the URL

http://hostname:port/tor/

Server descriptors may not exceed 20,000 bytes in length; extra-info documents may not exceed 50,000 bytes in length. If they do, the authorities SHOULD reject them.

Server descriptor format

Server descriptors consist of the following items.

In lines that take multiple arguments, extra arguments SHOULD be accepted and ignored. Many of the nonterminals below are defined in section 2.1.3.

Note that many versions of Tor will generate an extra newline at the end of their descriptors. Implementations MUST tolerate one or more blank lines at the end of a single descriptor or a list of concatenated descriptors. New implementations SHOULD NOT generate such blank lines.

"router" nickname address ORPort SOCKSPort DirPort NL

[At start, exactly once.]

Indicates the beginning of a server descriptor. "nickname" must be a valid router nickname as specified in section 2.1.3. "address" must be an IPv4 address in dotted-quad format. The last three numbers indicate the TCP ports at which this OR exposes functionality. ORPort is a port at which this OR accepts TLS connections for the main OR protocol; SOCKSPort is deprecated and should always be 0; and DirPort is the port at which this OR accepts directory-related HTTP connections. If any port is not supported, the value 0 is given instead of a port number. (At least one of DirPort and ORPort SHOULD be set; authorities MAY reject any descriptor with both DirPort and ORPort of 0.)

    "identity-ed25519" NL "-----BEGIN ED25519 CERT-----" NL certificate
           "-----END ED25519 CERT-----" NL

[Exactly once, in second position in document.] [No extra arguments]

The certificate is a base64-encoded Ed25519 certificate (see cert-spec.txt) with terminating =s removed. When this element is present, it MUST appear as the first or second element in the router descriptor.

The certificate has CERT_TYPE of [04]. It must include a signed-with-ed25519-key extension (see cert-spec.txt, section 2.2.1), so that we can extract the master identity key.

[Before Tor 0.4.5.1-alpha, this field was optional.]

"master-key-ed25519" SP MasterKey NL

[Exactly once]

Contains the base-64 encoded ed25519 master key as a single argument. If it is present, it MUST match the identity key in the identity-ed25519 entry.

[Before Tor 0.4.5.1-alpha, this field was optional.]

"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed NL

[Exactly once]

Estimated bandwidth for this router, in bytes per second. The "average" bandwidth is the volume per second that the OR is willing to sustain over long periods; the "burst" bandwidth is the volume that the OR is willing to sustain in very short intervals. The "observed" value is an estimate of the capacity this relay can handle. The relay remembers the max bandwidth sustained output over any ten second period in the past 5 days, and another sustained input. The "observed" value is the lesser of these two numbers.

Tor versions released before 2018 only kept bandwidth-observed for one day. These versions are no longer supported or recommended.

"platform" string NL

[At most once]

A human-readable string describing the system on which this OR is running. This MAY include the operating system, and SHOULD include the name and version of the software implementing the Tor protocol.

"published" YYYY-MM-DD HH:MM:SS NL

[Exactly once]

The time, in UTC, when this descriptor (and its corresponding extra-info document if any) was generated.

"fingerprint" fingerprint NL

[At most once]

A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in hex, with a single space after every 4 characters) for this router's identity key. A descriptor is considered invalid (and MUST be rejected) if the fingerprint line does not match the public key.

       [We didn't start parsing this line until Tor 0.1.0.6-rc; it should
        be marked with "opt" until earlier versions of Tor are obsolete.]

    "hibernating" bool NL

       [At most once]

If the value is 1, then the Tor relay was hibernating when the descriptor was published, and shouldn't be used to build circuits.

       [We didn't start parsing this line until Tor 0.1.0.6-rc; it should be
        marked with "opt" until earlier versions of Tor are obsolete.]

    "uptime" number NL

       [At most once]

       The number of seconds that this OR process has been running.

    "onion-key" NL a public key in PEM format

[Exactly once] [No extra arguments]

This key is used to encrypt CREATE cells for this OR. The key MUST be accepted for at least 1 week after any new key is published in a subsequent descriptor. It MUST be 1024 bits.

The key encoding is the encoding of the key as a PKCS#1 RSAPublicKey structure, encoded in base64, and wrapped in "-----BEGIN RSA PUBLIC KEY-----" and "-----END RSA PUBLIC KEY-----".

"onion-key-crosscert" NL a RSA signature in PEM format.

[Exactly once] [No extra arguments]

This element contains an RSA signature, generated using the onion-key, of the following:

          A SHA1 hash of the RSA identity key,
            i.e. RSA key from "signing-key" (see below) [20 bytes]
          The Ed25519 identity key,
            i.e. Ed25519 key from "master-key-ed25519" [32 bytes]

If there is no Ed25519 identity key, or if in some future version there is no RSA identity key, the corresponding field must be zero-filled.

Parties verifying this signature MUST allow additional data beyond the 52 bytes listed above.

This signature proves that the party creating the descriptor had control over the secret key corresponding to the onion-key.

[Before Tor 0.4.5.1-alpha, this field was optional whenever identity-ed25519 was absent.]

"ntor-onion-key" base-64-encoded-key

[Exactly once]

A curve25519 public key used for the ntor circuit extended handshake. It's the standard encoding of the OR's curve25519 public key, encoded in base 64. The trailing '=' sign MAY be omitted from the base64 encoding. The key MUST be accepted for at least 1 week after any new key is published in a subsequent descriptor.

[Before Tor 0.4.5.1-alpha, this field was optional.]

    "ntor-onion-key-crosscert" SP Bit NL
           "-----BEGIN ED25519 CERT-----" NL certificate
           "-----END ED25519 CERT-----" NL

[Exactly once] [No extra arguments]

A signature created with the ntor-onion-key, using the certificate format documented in cert-spec.txt, with type [0a]. The signed key here is the master identity key.

Bit must be "0" or "1". It indicates the sign of the ed25519 public key corresponding to the ntor onion key. If Bit is "0", then implementations MUST guarantee that the x-coordinate of the resulting ed25519 public key is positive. Otherwise, if Bit is "1", then the sign of the x-coordinate MUST be negative.

To compute the ed25519 public key corresponding to a curve25519 key, and for further explanation on key formats, see appendix C.

This signature proves that the party creating the descriptor had control over the secret key corresponding to the ntor-onion-key.

[Before Tor 0.4.5.1-alpha, this field was optional whenever identity-ed25519 was absent.]

"signing-key" NL a public key in PEM format

[Exactly once] [No extra arguments]

The OR's long-term RSA identity key. It MUST be 1024 bits.

The encoding is as for "onion-key" above.

"accept" exitpattern NL "reject" exitpattern NL

[Any number]

These lines describe an "exit policy": the rules that an OR follows when deciding whether to allow a new stream to a given address. The 'exitpattern' syntax is described below. There MUST be at least one such entry. The rules are considered in order; if no rule matches, the address will be accepted. For clarity, the last such entry SHOULD be accept : or reject :.

"ipv6-policy" SP ("accept" / "reject") SP PortList NL

[At most once.]

An exit-policy summary as specified in sections 3.4.1 and 3.8.2, summarizing the router's rules for connecting to IPv6 addresses. A missing "ipv6-policy" line is equivalent to "ipv6-policy reject 1-65535".

"overload-general" SP version SP YYYY-MM-DD HH:MM:SS NL

[At most once.]

Indicates that a relay has reached an "overloaded state" which can be one or many of the following load metrics:

         - Any OOM invocation due to memory pressure
         - Any ntor onionskins are dropped
         - TCP port exhaustion

The timestamp is when at least one metrics was detected. It should always be at the hour and thus, as an example, "2020-01-10 13:00:00" is an expected timestamp. Because this is a binary state, if the line is present, we consider that it was hit at the very least once somewhere between the provided timestamp and the "published" timestamp of the document which is when the document was generated.

The overload-general line should remain in place for 72 hours since last triggered. If the limits are reached again in this period, the timestamp is updated, and this 72 hour period restarts.

The 'version' field is set to '1' for now.

      (Introduced in tor-0.4.6.1-alpha, but moved from extra-info to general
       descriptor in tor-0.4.6.2-alpha)

    "router-sig-ed25519" SP Signature NL

       [Exactly once.]

It MUST be the next-to-last element in the descriptor, appearing immediately before the RSA signature. It MUST contain an Ed25519 signature of a SHA256 digest of the entire document. This digest is taken from the first character up to and including the first space after the "router-sig-ed25519" string. Before computing the digest, the string "Tor router descriptor signature v1" is prefixed to the document.

The signature is encoded in Base64, with terminating =s removed.

The signing key in the identity-ed25519 certificate MUST be the one used to sign the document.

[Before Tor 0.4.5.1-alpha, this field was optional whenever identity-ed25519 was absent.]

"router-signature" NL Signature NL

[At end, exactly once] [No extra arguments]

The "SIGNATURE" object contains a signature of the PKCS1-padded hash of the entire server descriptor, taken from the beginning of the "router" line, through the newline after the "router-signature" line. The server descriptor is invalid unless the signature is performed with the router's identity key.

"contact" info NL

[At most once]

Describes a way to contact the relay's administrator, preferably including an email address and a PGP key fingerprint.

"bridge-distribution-request" SP Method NL

[At most once, bridges only.]

The "Method" describes how a Bridge address is distributed by BridgeDB. Recognized methods are: "none", "any", "https", "email", "moat". If set to "none", BridgeDB will avoid distributing your bridge address. If set to "any", BridgeDB will choose how to distribute your bridge address. Choosing any of the other methods will tell BridgeDB to distribute your bridge via a specific method:

          - "https" specifies distribution via the web interface at
             https://bridges.torproject.org;
          - "email" specifies distribution via the email autoresponder at
            bridges@torproject.org;
          - "moat" specifies distribution via an interactive menu inside Tor
            Browser; and

        Potential future "Method" specifiers must be as follows:
            Method = (KeywordChar | "_") +

All bridges SHOULD include this line. Non-bridges MUST NOT include it.

BridgeDB SHOULD treat unrecognized Method values as if they were "none".

(Default: "any")

[This line was introduced in 0.3.2.3-alpha, with a minimal backport to 0.2.5.16, 0.2.8.17, 0.2.9.14, 0.3.0.13, 0.3.1.9, and later.]

"family" names NL

[At most once]

'Names' is a space-separated list of relay nicknames or hexdigests. If two ORs list one another in their "family" entries, then OPs should treat them as a single OR for the purpose of path selection.

For example, if node A's descriptor contains "family B", and node B's descriptor contains "family A", then node A and node B should never be used on the same circuit.

    "read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
        [At most once]
    "write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
        [At most once]

(These fields once appeared in router descriptors, but have appeared in extra-info descriptors since 0.2.0.x.)

"eventdns" bool NL

[At most once]

Declare whether this version of Tor is using the newer enhanced dns logic. Versions of Tor with this field set to false SHOULD NOT be used for reverse hostname lookups.

        [This option is obsolete.  All Tor current relays should be presumed
         to have the evdns backend.]

   "caches-extra-info" NL

[At most once.] [No extra arguments]

Present only if this router is a directory cache that provides extra-info documents.

[Versions before 0.2.0.1-alpha don't recognize this]

"extra-info-digest" SP sha1-digest [SP sha256-digest] NL

[At most once]

"sha1-digest" is a hex-encoded SHA1 digest (using upper-case characters) of the router's extra-info document, as signed in the router's extra-info (that is, not including the signature). (If this field is absent, the router is not uploading a corresponding extra-info document.)

"sha256-digest" is a base64-encoded SHA256 digest of the extra-info document. Unlike the "sha1-digest", this digest is calculated over the entire document, including the signature. This difference is due to a long-lived bug in the tor implementation that it would be difficult to roll out an incremental fix for, not a design choice. Future digest algorithms specified should not include the signature in the data used to compute the digest.

[Versions before 0.2.7.2-alpha did not include a SHA256 digest.] [Versions before 0.2.0.1-alpha don't recognize this field at all.]

"hidden-service-dir" NL

[At most once.]

Present only if this router stores and serves hidden service descriptors. This router supports the descriptor versions declared in the HSDir "proto" entry. If there is no "proto" entry, this router supports version 2 descriptors.

   "protocols" SP "Link" SP LINK-VERSION-LIST SP "Circuit" SP
          CIRCUIT-VERSION-LIST NL

       [At most once.]

An obsolete list of protocol versions, superseded by the "proto" entry. This list was never parsed, and has not been emitted since Tor 0.2.9.4-alpha. New code should neither generate nor parse this line.

"allow-single-hop-exits" NL

[At most once.] [No extra arguments]

Present only if the router allows single-hop circuits to make exit connections. Most Tor relays do not support this: this is included for specialized controllers designed to support perspective access and such. This is obsolete in tor version >= 0.3.1.0-alpha.

"or-address" SP ADDRESS ":" PORT NL

[Any number]

ADDRESS = IP6ADDR | IP4ADDR IPV6ADDR = an ipv6 address, surrounded by square brackets. IPV4ADDR = an ipv4 address, represented as a dotted quad. PORT = a number between 1 and 65535 inclusive.

An alternative for the address and ORPort of the "router" line, but with two added capabilities:

         * or-address can be either an IPv4 or IPv6 address
         * or-address allows for multiple ORPorts and addresses

A descriptor SHOULD NOT include an or-address line that does nothing but duplicate the address:port pair from its "router" line.

The ordering of or-address lines and their PORT entries matter because Tor MAY accept a limited number of address/port pairs. As of Tor 0.2.3.x only the first address/port pair is advertised and used.

"tunnelled-dir-server" NL

[At most once.] [No extra arguments]

       Present if the router accepts "tunneled" directory requests using a
       BEGIN_DIR cell over the router's OR port.
          (Added in 0.2.8.1-alpha. Before this, Tor relays accepted
          tunneled directory requests only if they had a DirPort open,
          or if they were bridges.)

   "proto" SP Entries NL

       [Exactly once.]

Entries = Entries = Entry Entries = Entry SP Entries

Entry = Keyword "=" Values

Values = Values = Value Values = Value "," Values

Value = Int Value = Int "-" Int

Int = NON_ZERO_DIGIT Int = Int DIGIT

Each 'Entry' in the "proto" line indicates that the Tor relay supports one or more versions of the protocol in question. Entries should be sorted by keyword. Values should be numerically ascending within each entry. (This implies that there should be no overlapping ranges.) Ranges should be represented as compactly as possible. Ints must be no larger than 63.

This field was first added in Tor 0.2.9.x.

[Before Tor 0.4.5.1-alpha, this field was optional.]

Extra-info document format

Extra-info documents consist of the following items:

    "extra-info" Nickname Fingerprint NL
        [At start, exactly once.]

Identifies what router this is an extra-info descriptor for. Fingerprint is encoded in hex (using upper-case letters), with no spaces.

    "identity-ed25519"
        [As in router descriptors]

    "published" YYYY-MM-DD HH:MM:SS NL

       [Exactly once.]

The time, in UTC, when this document (and its corresponding router descriptor if any) was generated. It MUST match the published time in the corresponding server descriptor.

    "read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
        [At most once.]
    "write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL
        [At most once.]

Declare how much bandwidth the OR has used recently. Usage is divided into intervals of NSEC seconds. The YYYY-MM-DD HH:MM:SS field defines the end of the most recent interval. The numbers are the number of bytes used in the most recent intervals, ordered from oldest to newest.

These fields include both IPv4 and IPv6 traffic.

    "ipv6-read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]
    "ipv6-write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]

Declare how much bandwidth the OR has used recently, on IPv6 connections. See "read-history" and "write-history" for full details.

    "geoip-db-digest" Digest NL
        [At most once.]

SHA1 digest of the IPv4 GeoIP database file that is used to resolve IPv4 addresses to country codes.

    "geoip6-db-digest" Digest NL
        [At most once.]

SHA1 digest of the IPv6 GeoIP database file that is used to resolve IPv6 addresses to country codes.

("geoip-start-time" YYYY-MM-DD HH:MM:SS NL) ("geoip-client-origins" CC=NUM,CC=NUM,... NL)

Only generated by bridge routers (see blocking.pdf), and only when they have been configured with a geoip database. Non-bridges SHOULD NOT generate these fields. Contains a list of mappings from two-letter country codes (CC) to the number of clients that have connected to that bridge from that country (approximate, and rounded up to the nearest multiple of 8 in order to hamper traffic analysis). A country is included only if it has at least one address. The time in "geoip-start-time" is the time at which we began collecting geoip statistics.

"geoip-start-time" and "geoip-client-origins" have been replaced by "bridge-stats-end" and "bridge-ips" in 0.2.2.4-alpha. The reason is that the measurement interval with "geoip-stats" as determined by subtracting "geoip-start-time" from "published" could have had a variable length, whereas the measurement interval in 0.2.2.4-alpha and later is set to be exactly 24 hours long. In order to clearly distinguish the new measurement intervals from the old ones, the new keywords have been introduced.

    "bridge-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

A "bridge-stats-end" line, as well as any other "bridge-*" line, is only added when the relay has been running as a bridge for at least 24 hours.

    "bridge-ips" CC=NUM,CC=NUM,... NL
        [At most once.]

List of mappings from two-letter country codes to the number of unique IP addresses that have connected from that country to the bridge and which are no known relays, rounded up to the nearest multiple of 8.

    "bridge-ip-versions" FAM=NUM,FAM=NUM,... NL
        [At most once.]

List of unique IP addresses that have connected to the bridge per protocol family.

    "bridge-ip-transports" PT=NUM,PT=NUM,... NL
        [At most once.]

List of mappings from pluggable transport names to the number of unique IP addresses that have connected using that pluggable transport. Unobfuscated connections are counted using the reserved pluggable transport name "" (without quotes). If we received a connection from a transport proxy but we couldn't figure out the name of the pluggable transport, we use the reserved pluggable transport name "".

("" and "" are reserved because normal pluggable transport names MUST match the following regular expression: "[a-zA-Z_][a-zA-Z0-9_]*" )

The pluggable transport name list is sorted into lexically ascending order.

If no clients have connected to the bridge yet, we only write "bridge-ip-transports" to the stats file.

    "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

A "dirreq-stats-end" line, as well as any other "dirreq-*" line, is only added when the relay has opened its Dir port and after 24 hours of measuring directory requests.

    "dirreq-v2-ips" CC=NUM,CC=NUM,... NL
        [At most once.]
    "dirreq-v3-ips" CC=NUM,CC=NUM,... NL
        [At most once.]

List of mappings from two-letter country codes to the number of unique IP addresses that have connected from that country to request a v2/v3 network status, rounded up to the nearest multiple of 8. Only those IP addresses are counted that the directory can answer with a 200 OK status code. (Note here and below: current Tor versions, as of 0.2.5.2-alpha, no longer cache or serve v2 networkstatus documents.)

    "dirreq-v2-reqs" CC=NUM,CC=NUM,... NL
        [At most once.]
    "dirreq-v3-reqs" CC=NUM,CC=NUM,... NL
        [At most once.]

List of mappings from two-letter country codes to the number of requests for v2/v3 network statuses from that country, rounded up to the nearest multiple of 8. Only those requests are counted that the directory can answer with a 200 OK status code.

    "dirreq-v2-share" NUM% NL
        [At most once.]
    "dirreq-v3-share" NUM% NL
        [At most once.]

The share of v2/v3 network status requests that the directory expects to receive from clients based on its advertised bandwidth compared to the overall network bandwidth capacity. Shares are formatted in percent with two decimal places. Shares are calculated as means over the whole 24-hour interval.

    "dirreq-v2-resp" status=NUM,... NL
        [At most once.]
    "dirreq-v3-resp" status=NUM,... NL
        [At most once.]

List of mappings from response statuses to the number of requests for v2/v3 network statuses that were answered with that response status, rounded up to the nearest multiple of 4. Only response statuses with at least 1 response are reported. New response statuses can be added at any time. The current list of response statuses is as follows:

        "ok": a network status request is answered; this number
           corresponds to the sum of all requests as reported in
           "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before
           rounding up.
        "not-enough-sigs: a version 3 network status is not signed by a
           sufficient number of requested authorities.
        "unavailable": a requested network status object is unavailable.
        "not-found": a requested network status is not found.
        "not-modified": a network status has not been modified since the
           If-Modified-Since time that is included in the request.
        "busy": the directory is busy.

    "dirreq-v2-direct-dl" key=NUM,... NL
        [At most once.]
    "dirreq-v3-direct-dl" key=NUM,... NL
        [At most once.]
    "dirreq-v2-tunneled-dl" key=NUM,... NL
        [At most once.]
    "dirreq-v3-tunneled-dl" key=NUM,... NL
        [At most once.]

List of statistics about possible failures in the download process of v2/v3 network statuses. Requests are either "direct" HTTP-encoded requests over the relay's directory port, or "tunneled" requests using a BEGIN_DIR cell over the relay's OR port. The list of possible statistics can change, and statistics can be left out from reporting. The current list of statistics is as follows:

Successful downloads and failures:

        "complete": a client has finished the download successfully.
        "timeout": a download did not finish within 10 minutes after
           starting to send the response.
        "running": a download is still running at the end of the
           measurement period for less than 10 minutes after starting to
           send the response.

        Download times:

        "min", "max": smallest and largest measured bandwidth in B/s.
        "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured
           bandwidth in B/s. For a given decile i, i/10 of all downloads
           had a smaller bandwidth than di, and (10-i)/10 of all downloads
           had a larger bandwidth than di.
        "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One
           fourth of all downloads had a smaller bandwidth than q1, one
           fourth of all downloads had a larger bandwidth than q3, and the
           remaining half of all downloads had a bandwidth between q1 and
           q3.
        "md": median of measured bandwidth in B/s. Half of the downloads
           had a smaller bandwidth than md, the other half had a larger
           bandwidth than md.

    "dirreq-read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]
    "dirreq-write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM... NL
        [At most once]

Declare how much bandwidth the OR has spent on answering directory requests. Usage is divided into intervals of NSEC seconds. The YYYY-MM-DD HH:MM:SS field defines the end of the most recent interval. The numbers are the number of bytes used in the most recent intervals, ordered from oldest to newest.

    "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

An "entry-stats-end" line, as well as any other "entry-*" line, is first added after the relay has been running for at least 24 hours.

    "entry-ips" CC=NUM,CC=NUM,... NL
        [At most once.]

List of mappings from two-letter country codes to the number of unique IP addresses that have connected from that country to the relay and which are no known other relays, rounded up to the nearest multiple of 8.

    "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

A "cell-stats-end" line, as well as any other "cell-*" line, is first added after the relay has been running for at least 24 hours.

    "cell-processed-cells" NUM,...,NUM NL
        [At most once.]

Mean number of processed cells per circuit, subdivided into deciles of circuits by the number of cells they have processed in descending order from loudest to quietest circuits.

    "cell-queued-cells" NUM,...,NUM NL
        [At most once.]

Mean number of cells contained in queues by circuit decile. These means are calculated by 1) determining the mean number of cells in a single circuit between its creation and its termination and 2) calculating the mean for all circuits in a given decile as determined in "cell-processed-cells". Numbers have a precision of two decimal places.

Note that this statistic can be inaccurate for circuits that had queued cells at the start or end of the measurement interval.

    "cell-time-in-queue" NUM,...,NUM NL
        [At most once.]

Mean time cells spend in circuit queues in milliseconds. Times are calculated by 1) determining the mean time cells spend in the queue of a single circuit and 2) calculating the mean for all circuits in a given decile as determined in "cell-processed-cells".

Note that this statistic can be inaccurate for circuits that had queued cells at the start or end of the measurement interval.

    "cell-circuits-per-decile" NUM NL
        [At most once.]

Mean number of circuits that are included in any of the deciles, rounded up to the next integer.

    "conn-bi-direct" YYYY-MM-DD HH:MM:SS (NSEC s) BELOW,READ,WRITE,BOTH NL
        [At most once]

Number of connections, split into 10-second intervals, that are used uni-directionally or bi-directionally as observed in the NSEC seconds (usually 86400 seconds) before YYYY-MM-DD HH:MM:SS. Every 10 seconds, we determine for every connection whether we read and wrote less than a threshold of 20 KiB (BELOW), read at least 10 times more than we wrote (READ), wrote at least 10 times more than we read (WRITE), or read and wrote more than the threshold, but not 10 times more in either direction (BOTH). After classifying a connection, read and write counters are reset for the next 10-second interval.

This measurement includes both IPv4 and IPv6 connections.

    "ipv6-conn-bi-direct" YYYY-MM-DD HH:MM:SS (NSEC s) BELOW,READ,WRITE,BOTH NL
        [At most once]

Number of IPv6 connections that are used uni-directionally or bi-directionally. See "conn-bi-direct" for more details.

    "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

An "exit-stats-end" line, as well as any other "exit-*" line, is first added after the relay has been running for at least 24 hours and only if the relay permits exiting (where exiting to a single port and IP address is sufficient).

    "exit-kibibytes-written" port=N,port=N,... NL
        [At most once.]
    "exit-kibibytes-read" port=N,port=N,... NL
        [At most once.]

List of mappings from ports to the number of kibibytes that the relay has written to or read from exit connections to that port, rounded up to the next full kibibyte. Relays may limit the number of listed ports and subsume any remaining kibibytes under port "other".

    "exit-streams-opened" port=N,port=N,... NL
        [At most once.]

List of mappings from ports to the number of opened exit streams to that port, rounded up to the nearest multiple of 4. Relays may limit the number of listed ports and subsume any remaining opened streams under port "other".

    "hidserv-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]
    "hidserv-v3-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default).

A "hidserv-stats-end" line, as well as any other "hidserv-*" line, is first added after the relay has been running for at least 24 hours.

(Introduced in tor-0.4.6.1-alpha)

    "hidserv-rend-relayed-cells" SP NUM SP key=val SP key=val ... NL
        [At most once.]
    "hidserv-rend-v3-relayed-cells" SP NUM SP key=val SP key=val ... NL
        [At most once.]

Approximate number of RELAY cells seen in either direction on a circuit after receiving and successfully processing a RENDEZVOUS1 cell.

The original measurement value is obfuscated in several steps: first, it is rounded up to the nearest multiple of 'bin_size' which is reported in the key=val part of this line; second, a (possibly negative) noise value is added to the result of the first step by randomly sampling from a Laplace distribution with mu = 0 and b = (delta_f / epsilon) with 'delta_f' and 'epsilon' being reported in the key=val part, too; third, the result of the previous obfuscation steps is truncated to the next smaller integer and included as 'NUM'. Note that the overall reported value can be negative.

(Introduced in tor-0.4.6.1-alpha)

    "hidserv-dir-onions-seen" SP NUM SP key=val SP key=val ... NL
        [At most once.]
    "hidserv-dir-v3-onions-seen" SP NUM SP key=val SP key=val ... NL
        [At most once.]

Approximate number of unique hidden-service identities seen in descriptors published to and accepted by this hidden-service directory.

The original measurement value is obfuscated in the same way as the 'NUM' value reported in "hidserv-rend-relayed-cells", but possibly with different parameters as reported in the key=val part of this line. Note that the overall reported value can be negative.

(Introduced in tor-0.4.6.1-alpha)

    "transport" transportname address:port [arglist] NL
        [Any number.]

Signals that the router supports the 'transportname' pluggable transport in IP address 'address' and TCP port 'port'. A single descriptor MUST not have more than one transport line with the same 'transportname'.

Pluggable transports are only relevant to bridges, but these entries can appear in non-bridge relays as well.

    "padding-counts" YYYY-MM-DD HH:MM:SS (NSEC s) key=NUM key=NUM ... NL
        [At most once.]

YYYY-MM-DD HH:MM:SS defines the end of the included measurement interval of length NSEC seconds (86400 seconds by default). Counts are reset to 0 at the end of this interval.

The keyword list is currently as follows:

         bin-size
           - The current rounding value for cell count fields (10000 by
             default)
         write-drop
           - The number of RELAY_DROP cells this relay sent
         write-pad
           - The number of CELL_PADDING cells this relay sent
         write-total
           - The total number of cells this relay cent
         read-drop
           - The number of RELAY_DROP cells this relay received
         read-pad
           - The number of CELL_PADDING cells this relay received
         read-total
           - The total number of cells this relay received
         enabled-read-pad
           - The number of CELL_PADDING cells this relay received on
             connections that support padding
         enabled-read-total
           - The total number of cells this relay received on connections
             that support padding
         enabled-write-pad
           - The total number of cells this relay received on connections
             that support padding
         enabled-write-total
           - The total number of cells sent by this relay on connections
             that support padding
         max-chanpad-timers
           - The maximum number of timers that this relay scheduled for
             padding in the previous NSEC interval

    "overload-ratelimits" SP version SP YYYY-MM-DD SP HH:MM:SS
                      SP rate-limit SP burst-limit
                      SP read-overload-count SP write-overload-count NL
        [At most once.]

        Indicates that a bandwidth limit was exhausted for this relay.

The "rate-limit" and "burst-limit" are the raw values from the BandwidthRate and BandwidthBurst found in the torrc configuration file.

The "{read|write}-overload-count" are the counts of how many times the reported limits of burst/rate were exhausted and thus the maximum between the read and write count occurrences. To make the counter more meaningful and to avoid multiple connections saturating the counter when a relay is overloaded, we only increment it once a minute.

The 'version' field is set to '1' for now.

(Introduced in tor-0.4.6.1-alpha)

    "overload-fd-exhausted" SP version YYYY-MM-DD HH:MM:SS NL
        [At most once.]

Indicates that a file descriptor exhaustion was experienced by this relay.

The timestamp indicates that the maximum was reached between the timestamp and the "published" timestamp of the document.

This overload field should remain in place for 72 hours since last triggered. If the limits are reached again in this period, the timestamp is updated, and this 72 hour period restarts.

The 'version' field is set to '1' for the initial implementation which detects fd exhaustion only when a socket open fails.

(Introduced in tor-0.4.6.1-alpha)

    "router-sig-ed25519"
        [As in router descriptors]

    "router-signature" NL Signature NL
        [At end, exactly once.]
        [No extra arguments]

A document signature as documented in section 1.3, using the initial item "extra-info" and the final item "router-signature", signed with the router's identity key.

Nonterminals in server descriptors

   nickname ::= between 1 and 19 alphanumeric characters ([A-Za-z0-9]),
      case-insensitive.
   hexdigest ::= a '$', followed by 40 hexadecimal characters
      ([A-Fa-f0-9]). [Represents a relay by the digest of its identity
      key.]

exitpattern ::= addrspec ":" portspec portspec ::= "*" | port | port "-" port port ::= an integer between 1 and 65535, inclusive.

      [Some implementations incorrectly generate ports with value 0.
       Implementations SHOULD accept this, and SHOULD NOT generate it.
       Connections to port 0 are never permitted.]

addrspec ::= "*" | ip4spec | ip6spec ipv4spec ::= ip4 | ip4 "/" num_ip4_bits | ip4 "/" ip4mask ip4 ::= an IPv4 address in dotted-quad format ip4mask ::= an IPv4 mask in dotted-quad format num_ip4_bits ::= an integer between 0 and 32 ip6spec ::= ip6 | ip6 "/" num_ip6_bits ip6 ::= an IPv6 address, surrounded by square brackets. num_ip6_bits ::= an integer between 0 and 128

bool ::= "0" | "1"

Directory authority operation and formats

Every authority has two keys used in this protocol: a signing key, and an authority identity key. (Authorities also have a router identity key used in their role as a router and by earlier versions of the directory protocol.) The identity key is used from time to time to sign new key certificates using new signing keys; it is very sensitive. The signing key is used to sign key certificates and status documents.

Creating key certificates

Key certificates consist of the following items:

"dir-key-certificate-version" version NL

[At start, exactly once.]

Determines the version of the key certificate. MUST be "3" for the protocol described in this document. Implementations MUST reject formats they don't understand.

    "dir-address" IPPort NL
        [At most once]

        An IP:Port for this authority's directory port.

    "fingerprint" fingerprint NL

        [Exactly once.]

Hexadecimal encoding without spaces based on the authority's identity key.

"dir-identity-key" NL a public key in PEM format

[Exactly once.] [No extra arguments]

The long-term authority identity key for this authority. This key SHOULD be at least 2048 bits long; it MUST NOT be shorter than 1024 bits.

"dir-key-published" YYYY-MM-DD HH:MM:SS NL

[Exactly once.]

The time (in UTC) when this document and corresponding key were last generated.

Implementations SHOULD reject certificates that are published too far in the future, though they MAY tolerate some clock skew.

"dir-key-expires" YYYY-MM-DD HH:MM:SS NL

[Exactly once.]

A time (in UTC) after which this key is no longer valid.

Implementations SHOULD reject expired certificates, though they MAY tolerate some clock skew.

"dir-signing-key" NL a key in PEM format

[Exactly once.] [No extra arguments]

The directory server's public signing key. This key MUST be at least 1024 bits, and MAY be longer.

"dir-key-crosscert" NL CrossSignature NL

[Exactly once.] [No extra arguments]

CrossSignature is a signature, made using the certificate's signing key, of the digest of the PKCS1-padded hash of the certificate's identity key. For backward compatibility with broken versions of the parser, we wrap the base64-encoded signature in -----BEGIN ID SIGNATURE---- and -----END ID SIGNATURE----- tags. Implementations MUST allow the "ID " portion to be omitted, however.

Implementations MUST verify that the signature is a correct signature of the hash of the identity key using the signing key.

"dir-key-certification" NL Signature NL

[At end, exactly once.] [No extra arguments]

A document signature as documented in section 1.3, using the initial item "dir-key-certificate-version" and the final item "dir-key-certification", signed with the authority identity key.

Authorities MUST generate a new signing key and corresponding certificate before the key expires.

Accepting server descriptor and extra-info document uploads

When a router posts a signed descriptor to a directory authority, the authority first checks whether it is well-formed and correctly self-signed. If it is, the authority next verifies that the nickname in question is not already assigned to a router with a different public key. Finally, the authority MAY check that the router is not blacklisted because of its key, IP, or another reason.

An authority also keeps a record of all the Ed25519/RSA1024 identity key pairs that it has seen before. It rejects any descriptor that has a known Ed/RSA identity key that it has already seen accompanied by a different RSA/Ed identity key in an older descriptor.

At a future date, authorities will begin rejecting all descriptors whose RSA key was previously accompanied by an Ed25519 key, if the descriptor does not list an Ed25519 key.

At a future date, authorities will begin rejecting all descriptors that do not list an Ed25519 key.

If the descriptor passes these tests, and the authority does not already have a descriptor for a router with this public key, it accepts the descriptor and remembers it.

If the authority does have a descriptor with the same public key, the newly uploaded descriptor is remembered if its publication time is more recent than the most recent old descriptor for that router, and either:

      - There are non-cosmetic differences between the old descriptor and the
        new one.
      - Enough time has passed between the descriptors' publication times.
        (Currently, 2 hours.)

Differences between server descriptors are "non-cosmetic" if they would be sufficient to force an upload as described in section 2.1 above.

Note that the "cosmetic difference" test only applies to uploaded descriptors, not to descriptors that the authority downloads from other authorities.

When a router posts a signed extra-info document to a directory authority, the authority again checks it for well-formedness and correct signature, and checks that its matches the extra-info-digest in some router descriptor that it believes is currently useful. If so, it accepts it and stores it and serves it as requested. If not, it drops it.

Computing microdescriptors

Microdescriptors are a stripped-down version of server descriptors generated by the directory authorities which may additionally contain authority-generated information. Microdescriptors contain only the most relevant parts that clients care about. Microdescriptors are expected to be relatively static and only change about once per week. Microdescriptors do not contain any information that clients need to use to decide which servers to fetch information about, or which servers to fetch information from.

Microdescriptors are a straight transform from the server descriptor and the consensus method. Microdescriptors have no header or footer. Microdescriptors are identified by the hash of its concatenated elements without a signature by the router. Microdescriptors do not contain any version information, because their version is determined by the consensus method.

Starting with consensus method 8, microdescriptors contain the following elements taken from or based on the server descriptor. Order matters here, because different directory authorities must be able to transform a given server descriptor and consensus method into the exact same microdescriptor.

"onion-key" NL a public key in PEM format

[Exactly once, at start] [No extra arguments]

The "onion-key" element as specified in section 2.1.1.

When generating microdescriptors for consensus method 30 or later, the trailing = sign must be absent. For consensus method 29 or earlier, the trailing = sign must be present.

"ntor-onion-key" SP base-64-encoded-key NL

[Exactly once]

The "ntor-onion-key" element as specified in section 2.1.1.

(Only included when generating microdescriptors for consensus-method 16 or later.)

[Before Tor 0.4.5.1-alpha, this field was optional.]

"a" SP address ":" port NL

[Any number]

Additional advertised addresses for the OR.

Present currently only if the OR advertises at least one IPv6 address; currently, the first address is included and all others are omitted. Any other IPv4 or IPv6 addresses should be ignored.

Address and port are as for "or-address" as specified in section 2.1.1.

(Only included when generating microdescriptors for consensus-methods 14 to 27.)

"family" names NL

[At most once]

The "family" element as specified in section 2.1.1.

When generating microdescriptors for consensus method 29 or later, the following canonicalization algorithm is applied to improve compression:

           For all entries of the form $hexid=name or $hexid~name,
           remove the =name or ~name portion.

           Remove all entries of the form $hexid, where hexid is not
           40 hexadecimal characters long.

           If an entry is a valid nickname, put it into lower case.

           If an entry is a valid $hexid, put it into upper case.

           If there are any entries, add a single $hexid entry for
           the relay in question, so that it is a member of its own
           family.

           Sort all entries in lexical order.

           Remove duplicate entries.

(Note that if an entry is not of the form "nickname", "$hexid", "$hexid=nickname" or "$hexid~nickname", then it will be unchanged: this is what makes the algorithm forward-compatible.)

"p" SP ("accept" / "reject") SP PortList NL

[Exactly once.]

The exit-policy summary as specified in sections 3.4.1 and 3.8.2.

[With microdescriptors, clients don't learn exact exit policies: clients can only guess whether a relay accepts their request, try the BEGIN request, and might get end-reason-exit-policy if they guessed wrong, in which case they'll have to try elsewhere.]

[In consensus methods before 5, this line was omitted.]

"p6" SP ("accept" / "reject") SP PortList NL

[At most once]

The IPv6 exit policy summary as specified in sections 3.4.1 and 3.8.2. A missing "p6" line is equivalent to "p6 reject 1-65535".

(Only included when generating microdescriptors for consensus-method 15 or later.)

"id" SP "rsa1024" SP base64-encoded-identity-digest NL

[At most once]

The node identity digest (as described in tor-spec.txt), base64 encoded, without trailing =s. This line is included to prevent collisions between microdescriptors.

Implementations SHOULD ignore these lines: they are added to microdescriptors only to prevent collisions.

(Only included when generating microdescriptors for consensus-method 18 or later.)

"id" SP "ed25519" SP base64-encoded-ed25519-identity NL

[At most once]

The node's master Ed25519 identity key, base64 encoded, without trailing =s.

All implementations MUST ignore this key for any microdescriptor whose corresponding entry in the consensus includes the 'NoEdConsensus' flag.

(Only included when generating microdescriptors for consensus-method 21 or later.)

"id" SP keytype ... NL

[At most once per distinct keytype.]

Implementations MUST ignore "id" lines with unrecognized key-types in place of "rsa1024" or "ed25519"

"pr" SP Entries NL

[Exactly once.]

The "proto" element as specified in section 2.1.1.

[Before Tor 0.4.5.1-alpha, this field was optional.]

(Note that with microdescriptors, clients do not learn the RSA identity of their routers: they only learn a hash of the RSA identity key. This is all they need to confirm the actual identity key when doing a TLS handshake, and all they need to put the identity key digest in their CREATE cells.)

Exchanging votes

Authorities divide time into Intervals. Authority administrators SHOULD try to all pick the same interval length, and SHOULD pick intervals that are commonly used divisions of time (e.g., 5 minutes, 15 minutes, 30 minutes, 60 minutes, 90 minutes). Voting intervals SHOULD be chosen to divide evenly into a 24-hour day.

Authorities SHOULD act according to interval and delays in the latest consensus. Lacking a latest consensus, they SHOULD default to a 30-minute Interval, a 5 minute VotingDelay, and a 5 minute DistDelay.

Authorities MUST take pains to ensure that their clocks remain accurate within a few seconds. (Running NTP is usually sufficient.)

The first voting period of each day begins at 00:00 (midnight) UTC. If the last period of the day would be truncated by one-half or more, it is merged with the second-to-last period.

An authority SHOULD publish its vote immediately at the start of each voting period (minus VoteSeconds+DistSeconds). It does this by making it available at

http:///tor/status-vote/next/authority.z

and sending it in an HTTP POST request to each other authority at the URL

http:///tor/post/vote

If, at the start of the voting period, minus DistSeconds, an authority does not have a current statement from another authority, the first authority downloads the other's statement.

Once an authority has a vote from another authority, it makes it available at

http:///tor/status-vote/next/.z

where is the fingerprint of the other authority's identity key. And at

http:///tor/status-vote/next/d/.z

where is the digest of the vote document.

Also, once an authority receives a vote from another authority, it examines it for new descriptors and fetches them from that authority. This may be the only way for an authority to hear about relays that didn't publish their descriptor to all authorities, and, while it's too late for the authority to include relays in its current vote, it can include them in its next vote. See section 3.6 below for details.

Vote and consensus status document formats

Votes and consensuses are more strictly formatted than other documents in this specification, since different authorities must be able to generate exactly the same consensus given the same set of votes.

The procedure for deciding when to generate vote and consensus status documents are described in section 1.4 on the voting timeline.

Status documents contain a preamble, an authority section, a list of router status entries, and one or more footer signature, in that order.

Unlike other formats described above, a SP in these documents must be a single space character (hex 20).

Some items appear only in votes, and some items appear only in consensuses. Unless specified, items occur in both.

The preamble contains the following items. They SHOULD occur in the order given here:

"network-status-version" SP version NL

[At start, exactly once.]

A document format version. For this specification, the version is "3".

"vote-status" SP type NL

[Exactly once.]

The status MUST be "vote" or "consensus", depending on the type of the document.

"consensus-methods" SP IntegerList NL

[At most once for votes; does not occur in consensuses.]

A space-separated list of supported methods for generating consensuses from votes. See section 3.8.1 for details. Absence of the line means that only method "1" is supported.

"consensus-method" SP Integer NL

[At most once for consensuses; does not occur in votes.] [No extra arguments]

See section 3.8.1 for details.

(Only included when the vote is generated with consensus-method 2 or later.)

"published" SP YYYY-MM-DD SP HH:MM:SS NL

[Exactly once for votes; does not occur in consensuses.]

The publication time for this status document (if a vote).

"valid-after" SP YYYY-MM-DD SP HH:MM:SS NL

[Exactly once.]

The start of the Interval for this vote. Before this time, the consensus document produced from this vote is not officially in use.

(Note that because of propagation delays, clients and relays may see consensus documents that are up to DistSeconds earlier than this time, and should not warn about them.)

See section 1.4 for voting timeline information.

"fresh-until" SP YYYY-MM-DD SP HH:MM:SS NL

[Exactly once.]

The time at which the next consensus should be produced; before this time, there is no point in downloading another consensus, since there won't be a new one. See section 1.4 for voting timeline information.

"valid-until" SP YYYY-MM-DD SP HH:MM:SS NL

[Exactly once.]

The end of the Interval for this vote. After this time, all clients should try to find a more recent consensus. See section 1.4 for voting timeline information.

In practice, clients continue to use the consensus for up to 24 hours after it is no longer valid, if no more recent consensus can be downloaded.

"voting-delay" SP VoteSeconds SP DistSeconds NL

[Exactly once.]

VoteSeconds is the number of seconds that we will allow to collect votes from all authorities; DistSeconds is the number of seconds we'll allow to collect signatures from all authorities. See section 1.4 for voting timeline information.

"client-versions" SP VersionList NL

[At most once.]

A comma-separated list of recommended Tor versions for client usage, in ascending order. The versions are given as defined by version-spec.txt. If absent, no opinion is held about client versions.

"server-versions" SP VersionList NL

[At most once.]

A comma-separated list of recommended Tor versions for relay usage, in ascending order. The versions are given as defined by version-spec.txt. If absent, no opinion is held about server versions.

"package" SP PackageName SP Version SP URL SP DIGESTS NL

[Any number of times.]

For this element:

        PACKAGENAME = NONSPACE
        VERSION = NONSPACE
        URL = NONSPACE
        DIGESTS = DIGEST | DIGESTS SP DIGEST
        DIGEST = DIGESTTYPE "=" DIGESTVAL
        NONSPACE = one or more non-space printing characters
        DIGESTVAL = DIGESTTYPE = one or more non-space printing characters
              other than "=".

Indicates that a package called "package" of version VERSION may be found at URL, and its digest as computed with DIGESTTYPE is equal to DIGESTVAL. In consensuses, these lines are sorted lexically by "PACKAGENAME VERSION" pairs, and DIGESTTYPES must appear in ascending order. A consensus must not contain the same "PACKAGENAME VERSION" more than once. If a vote contains the same "PACKAGENAME VERSION" more than once, all but the last is ignored.

Included in consensuses only for method 19 and later.

"known-flags" SP FlagList NL

[Exactly once.]

A space-separated list of all of the flags that this document might contain. A flag is "known" either because the authority knows about them and might set them (if in a vote), or because enough votes were counted for the consensus for an authoritative opinion to have been formed about their status.

"flag-thresholds" SP Thresholds NL

[At most once for votes; does not occur in consensuses.]

         A space-separated list of the internal performance thresholds
         that the directory authority had at the moment it was forming
         a vote.

         The metaformat is:
            Thresholds = Threshold | Threshold SP Thresholds
            Threshold = ThresholdKey '=' ThresholdVal
            ThresholdKey = (KeywordChar | "_") +
            ThresholdVal = [0-9]+("."[0-9]+)? "%"?

         Commonly used Thresholds at this point include:

         "stable-uptime" -- Uptime (in seconds) required for a relay
                            to be marked as stable.

         "stable-mtbf" -- MTBF (in seconds) required for a relay to be
                          marked as stable.

         "enough-mtbf" -- Whether we have measured enough MTBF to look
                          at stable-mtbf instead of stable-uptime.

         "fast-speed" -- Bandwidth (in bytes per second) required for
                         a relay to be marked as fast.

         "guard-wfu" -- WFU (in seconds) required for a relay to be
                        marked as guard.

         "guard-tk" -- Weighted Time Known (in seconds) required for a
                       relay to be marked as guard.

         "guard-bw-inc-exits" -- If exits can be guards, then all guards
                                 must have a bandwidth this high.

         "guard-bw-exc-exits" -- If exits can't be guards, then all guards
                                 must have a bandwidth this high.

         "ignoring-advertised-bws" -- 1 if we have enough measured bandwidths
                                 that we'll ignore the advertised bandwidth
                                 claims of routers without measured bandwidth.

"recommended-client-protocols" SP Entries NL "recommended-relay-protocols" SP Entries NL "required-client-protocols" SP Entries NL "required-relay-protocols" SP Entries NL

[At most once for each.]

The "proto" element as specified in section 2.1.1.

To vote on these entries, a protocol/version combination is included only if it is listed by a majority of the voters.

These lines should be voted on. A majority of votes is sufficient to make a protocol un-supported. A supermajority of authorities (2/3) are needed to make a protocol required. The required protocols should not be torrc-configurable, but rather should be hardwired in the Tor code.

The tor-spec.txt section 9 details how a relay and a client should behave when they encounter these lines in the consensus.

"params" SP [Parameters] NL

[At most once]

Parameter ::= Keyword '=' Int32 Int32 ::= A decimal integer between -2147483648 and 2147483647. Parameters ::= Parameter | Parameters SP Parameter

The parameters list, if present, contains a space-separated list of case-sensitive key-value pairs, sorted in lexical order by their keyword (as ASCII byte strings). Each parameter has its own meaning.

(Only included when the vote is generated with consensus-method 7 or later.)

See param-spec.txt for a list of parameters and their meanings.

"shared-rand-previous-value" SP NumReveals SP Value NL

[At most once]

NumReveals ::= An integer greater or equal to 0. Value ::= Base64-encoded-data

The shared_random_value that was generated during the second-to-last shared randomness protocol run. For example, if this document was created on the 5th of November, this field carries the shared random value generated during the protocol run of the 3rd of November.

See section [SRCALC] of srv-spec.txt for instructions on how to compute this value, and see section [CONS] for why we include old shared random values in votes and consensus.

Value is the actual shared random value encoded in base64. It will be exactly 256 bits long. NumReveals is the number of commits used to generate this SRV.

"shared-rand-current-value" SP NumReveals SP Value NL

[At most once]

NumReveals ::= An integer greater or equal to 0. Value ::= Base64-encoded-data

The shared_random_value that was generated during the latest shared randomness protocol run. For example, if this document was created on the 5th of November, this field carries the shared random value generated during the protocol run of the 4th of November

See section [SRCALC] of srv-spec.txt for instructions on how to compute this value given the active commits.

Value is the actual shared random value encoded in base64. It will be exactly 256 bits long. NumReveals is the number of commits used to generate this SRV.

"bandwidth-file-headers" SP KeyValues NL

[At most once for votes; does not occur in consensuses.]

KeyValues ::= "" | KeyValue | KeyValues SP KeyValue KeyValue ::= Keyword '=' Value Value ::= ArgumentCharValue+ ArgumentCharValue ::= any printing ASCII character except NL and SP.

The headers from the bandwidth file used to generate this vote. The bandwidth file headers are described in bandwidth-file-spec.txt.

If an authority is not configured with a V3BandwidthsFile, this line SHOULD NOT appear in its vote.

If an authority is configured with a V3BandwidthsFile, but parsing fails, this line SHOULD appear in its vote, but without any headers.

First-appeared: Tor 0.3.5.1-alpha.

"bandwidth-file-digest" 1*(SP algorithm "=" digest) NL

[At most once for votes; does not occur in consensuses.]

A digest of the bandwidth file used to generate this vote. "algorithm" is the name of the hash algorithm producing "digest", which can be "sha256" or another algorithm. "digest" is the base64 encoding of the hash of the bandwidth file, with trailing =s omitted.

If an authority is not configured with a V3BandwidthsFile, this line SHOULD NOT appear in its vote.

If an authority is configured with a V3BandwidthsFile, but parsing fails, this line SHOULD appear in its vote, with the digest(s) of the unparseable file.

First-appeared: Tor 0.4.0.4-alpha

The authority section of a vote contains the following items, followed in turn by the authority's current key certificate:

    "dir-source" SP nickname SP identity SP address SP IP SP dirport SP
       orport NL

        [Exactly once, at start]

Describes this authority. The nickname is a convenient identifier for the authority. The identity is an uppercase hex fingerprint of the authority's current (v3 authority) identity key. The address is the server's hostname. The IP is the server's current IP address, and dirport is its current directory port. The orport is the port at that address where the authority listens for OR connections.

"contact" SP string NL

[Exactly once]

An arbitrary string describing how to contact the directory server's administrator. Administrators should include at least an email address and a PGP fingerprint.

"legacy-dir-key" SP FINGERPRINT NL

[At most once]

Lists a fingerprint for an obsolete identity key still used by this authority to keep older clients working. This option is used to keep key around for a little while in case the authorities need to migrate many identity keys at once. (Generally, this would only happen because of a security vulnerability that affected multiple authorities, like the Debian OpenSSL RNG bug of May 2008.)

"shared-rand-participate" NL

[At most once]

Denotes that the directory authority supports and can participate in the shared random protocol.

"shared-rand-commit" SP Version SP AlgName SP Identity SP Commit [SP Reveal] NL

[Any number of times]

Version ::= An integer greater or equal to 0. AlgName ::= 1*(ALPHA / DIGIT / "_" / "-") Identity ::= 40 * HEXDIG Commit ::= Base64-encoded-data Reveal ::= Base64-encoded-data

Denotes a directory authority commit for the shared randomness protocol, containing the commitment value and potentially also the reveal value. See sections [COMMITREVEAL] and [VALIDATEVALUES] of srv-spec.txt on how to generate and validate these values.

Version is the current shared randomness protocol version. AlgName is the hash algorithm that is used (e.g. "sha3-256") and Identity is the authority's SHA1 v3 identity fingerprint. Commit is the encoded commitment value in base64. Reveal is optional and if it's set, it contains the reveal value in base64.

If a vote contains multiple commits from the same authority, the receiver MUST only consider the first commit listed.

"shared-rand-previous-value" SP NumReveals SP Value NL

[At most once]

See shared-rand-previous-value description above.

"shared-rand-current-value" SP NumReveals SP Value NL

[At most once]

See shared-rand-current-value description above.

The authority section of a consensus contains groups of the following items, in the order given, with one group for each authority that contributed to the consensus, with groups sorted by authority identity digest:

    "dir-source" SP nickname SP identity SP address SP IP SP dirport SP
       orport NL

        [Exactly once, at start]

        As in the authority section of a vote.

    "contact" SP string NL

        [Exactly once.]

        As in the authority section of a vote.

    "vote-digest" SP digest NL

        [Exactly once.]

A digest of the vote from the authority that contributed to this consensus, as signed (that is, not including the signature). (Hex, upper-case.)

For each "legacy-dir-key" in the vote, there is an additional "dir-source" line containing that legacy key's fingerprint, the authority's nickname with "-legacy" appended, and all other fields as in the main "dir-source" line for that authority. These "dir-source" lines do not have corresponding "contact" or "vote-digest" entries.

Each router status entry contains the following items. Router status entries are sorted in ascending order by identity digest.

    "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort
        SP DirPort NL

        [At start, exactly once.]

"Nickname" is the OR's nickname. "Identity" is a hash of its identity key, encoded in base64, with trailing equals sign(s) removed. "Digest" is a hash of its most recent descriptor as signed (that is, not including the signature) by the RSA identity key (see section 1.3.), encoded in base64.

"Publication" was once the publication time of the router's most recent descriptor, in the form YYYY-MM-DD HH:MM:SS, in UTC. Now it is only used in votes, and may be set to a fixed value in consensus documents. Implementations SHOULD ignore this value in non-vote documents.

"IP" is its current IP address; ORPort is its current OR port, "DirPort" is its current directory port, or "0" for "none".

"a" SP address ":" port NL

[Any number]

The first advertised IPv6 address for the OR, if it is reachable.

Present only if the OR advertises at least one IPv6 address, and the authority believes that the first advertised address is reachable. Any other IPv4 or IPv6 addresses should be ignored.

Address and port are as for "or-address" as specified in section 2.1.1.

(Only included when the vote or consensus is generated with consensus-method 14 or later.)

"s" SP Flags NL

[Exactly once.]

A series of space-separated status flags, in lexical order (as ASCII byte strings). Currently documented flags are:

          "Authority" if the router is a directory authority.
          "BadExit" if the router is believed to be useless as an exit node
             (because its ISP censors it, because it is behind a restrictive
             proxy, or for some similar reason).
          "Exit" if the router is more useful for building
             general-purpose exit circuits than for relay circuits.  The
             path building algorithm uses this flag; see path-spec.txt.
          "Fast" if the router is suitable for high-bandwidth circuits.
          "Guard" if the router is suitable for use as an entry guard.
          "HSDir" if the router is considered a v2 hidden service directory.
          "MiddleOnly" if the router is considered unsuitable for
             usage other than as a middle relay. Clients do not need
             to handle this option, since when it is present, the authorities
             will automatically vote against flags that would make the router
             usable in other positions. (Since 0.4.7.2-alpha.)
          "NoEdConsensus" if any Ed25519 key in the router's descriptor or
             microdescriptor does not reflect authority consensus.
          "Stable" if the router is suitable for long-lived circuits.
          "StaleDesc" if the router should upload a new descriptor because
             the old one is too old.
          "Running" if the router is currently usable over all its published
             ORPorts. (Authorities ignore IPv6 ORPorts unless configured to
             check IPv6 reachability.) Relays without this flag are omitted
             from the consensus, and current clients (since 0.2.9.4-alpha)
             assume that every listed relay has this flag.
          "Valid" if the router has been 'validated'. Clients before
             0.2.9.4-alpha would not use routers without this flag by
             default. Currently, relays without this flag are omitted
             from the consensus, and current (post-0.2.9.4-alpha) clients
             assume that every listed relay has this flag.
          "V2Dir" if the router implements the v2 directory protocol or
             higher.

    "v" SP version NL

        [At most once.]

The version of the Tor protocol that this relay is running. If the value begins with "Tor" SP, the rest of the string is a Tor version number, and the protocol is "The Tor protocol as supported by the given version of Tor." Otherwise, if the value begins with some other string, Tor has upgraded to a more sophisticated protocol versioning system, and the protocol is "a version of the Tor protocol more recent than any we recognize."

Directory authorities SHOULD omit version strings they receive from descriptors if they would cause "v" lines to be over 128 characters long.

"pr" SP Entries NL

[At most once.]

The "proto" family element as specified in section 2.1.1.

During voting, authorities copy these lines immediately below the "v" lines. When a descriptor does not contain a "proto" entry, the authorities should reconstruct it using the approach described below in section D. They are included in the consensus using the same rules as currently used for "v" lines, if a sufficiently late consensus method is in use.

"w" SP "Bandwidth=" INT [SP "Measured=" INT] [SP "Unmeasured=1"] NL

[At most once.]

An estimate of the bandwidth of this relay, in an arbitrary unit (currently kilobytes per second). Used to weight router selection. See section 3.4.2 for details on how the value of Bandwidth is determined in a consensus.

Additionally, the Measured= keyword is present in votes by participating bandwidth measurement authorities to indicate a measured bandwidth currently produced by measuring stream capacities. It does not occur in consensuses.

'Bandwidth=' and 'Measured=' values must be between 0 and 2^32 - 1 inclusive.

The "Unmeasured=1" value is included in consensuses generated with method 17 or later when the 'Bandwidth=' value is not based on a threshold of 3 or more measurements for this relay.

Other weighting keywords may be added later. Clients MUST ignore keywords they do not recognize.

"p" SP ("accept" / "reject") SP PortList NL

[At most once.]

PortList = PortOrRange PortList = PortList "," PortOrRange PortOrRange = INT "-" INT / INT

A list of those ports that this router supports (if 'accept') or does not support (if 'reject') for exit to "most addresses".

"m" SP methods 1*(SP algorithm "=" digest) NL

[Any number, only in votes.]

Microdescriptor hashes for all consensus methods that an authority supports and that use the same microdescriptor format. "methods" is a comma-separated list of the consensus methods that the authority believes will produce "digest". "algorithm" is the name of the hash algorithm producing "digest", which can be "sha256" or something else, depending on the consensus "methods" supporting this algorithm. "digest" is the base64 encoding of the hash of the router's microdescriptor with trailing =s omitted.

     "id" SP "ed25519" SP ed25519-identity NL
     "id" SP "ed25519" SP "none" NL
        [vote only, at most once]

     "stats" SP [KeyValues] NL

        [At most once. Vote only]

KeyValue ::= Keyword '=' Number Number ::= [0-9]+("."[0-9]+)? KeyValues ::= KeyValue | KeyValues SP KeyValue

Line containing various statistics that an authority has computed for this relay. Each stats is represented as a key + value. Reported keys are:

          "wfu"  - Weighted Fractional Uptime
          "tk"   - Weighted Time Known
          "mtbf" - Mean Time Between Failure (stability)

          (As of tor-0.4.6.1-alpha)

The footer section is delineated in all votes and consensuses supporting consensus method 9 and above with the following:

"directory-footer" NL [No extra arguments]

It contains two subsections, a bandwidths-weights line and a directory-signature. (Prior to consensus method 9, footers only contained directory-signatures without a 'directory-footer' line or bandwidth-weights.)

The bandwidths-weights line appears At Most Once for a consensus. It does not appear in votes.

"bandwidth-weights" [SP Weights] NL

Weight ::= Keyword '=' Int32 Int32 ::= A decimal integer between -2147483648 and 2147483647. Weights ::= Weight | Weights SP Weight

List of optional weights to apply to router bandwidths during path selection. They are sorted in lexical order (as ASCII byte strings) and values are divided by the consensus' "bwweightscale" param. Definition of our known entries are...

         Wgg - Weight for Guard-flagged nodes in the guard position
         Wgm - Weight for non-flagged nodes in the guard Position
         Wgd - Weight for Guard+Exit-flagged nodes in the guard Position

         Wmg - Weight for Guard-flagged nodes in the middle Position
         Wmm - Weight for non-flagged nodes in the middle Position
         Wme - Weight for Exit-flagged nodes in the middle Position
         Wmd - Weight for Guard+Exit flagged nodes in the middle Position

         Weg - Weight for Guard flagged nodes in the exit Position
         Wem - Weight for non-flagged nodes in the exit Position
         Wee - Weight for Exit-flagged nodes in the exit Position
         Wed - Weight for Guard+Exit-flagged nodes in the exit Position

         Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
         Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
         Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
         Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes

         Wbg - Weight for Guard flagged nodes for BEGIN_DIR requests
         Wbm - Weight for non-flagged nodes for BEGIN_DIR requests
         Wbe - Weight for Exit-flagged nodes for BEGIN_DIR requests
         Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests

       These values are calculated as specified in section 3.8.3.

The signature contains the following item, which appears Exactly Once for a vote, and At Least Once for a consensus.

    "directory-signature" [SP Algorithm] SP identity SP signing-key-digest
        NL Signature

This is a signature of the status document, with the initial item "network-status-version", and the signature item "directory-signature", using the signing key. (In this case, we take the hash through the space after directory-signature, not the newline: this ensures that all authorities sign the same thing.) "identity" is the hex-encoded digest of the authority identity key of the signing authority, and "signing-key-digest" is the hex-encoded digest of the current authority signing key of the signing authority.

The Algorithm is one of "sha1" or "sha256" if it is present; implementations MUST ignore directory-signature entries with an unrecognized Algorithm. "sha1" is the default, if no Algorithm is given. The algorithm describes how to compute the hash of the document before signing it.

"ns"-flavored consensus documents must contain only sha1 signatures. Votes and microdescriptor documents may contain other signature types. Note that only one signature from each authority should be "counted" as meaning that the authority has signed the consensus.

(Tor clients before 0.2.3.x did not understand the 'algorithm' field.)

Assigning flags in a vote

(This section describes how directory authorities choose which status flags to apply to routers. Later directory authorities MAY do things differently, so long as clients keep working well. Clients MUST NOT depend on the exact behaviors in this section.)

In the below definitions, a router is considered "active" if it is running, valid, and not hibernating.

When we speak of a router's bandwidth in this section, we mean either its measured bandwidth, or its advertised bandwidth. If a sufficient threshold (configurable with MinMeasuredBWsForAuthToIgnoreAdvertised, 500 by default) of routers have measured bandwidth values, then the authority bases flags on measured bandwidths, and treats nodes with non-measured bandwidths as if their bandwidths were zero. Otherwise, it uses measured bandwidths for nodes that have them, and advertised bandwidths for other nodes.

When computing thresholds based on percentiles of nodes, an authority only considers nodes that are active, that have not been omitted as a sybil (see below), and whose bandwidth is at least 4 KB. Nodes that don't meet these criteria do not influence any threshold calculations (including calculation of stability and uptime and bandwidth thresholds) and also do not have their Exit status change.

"Valid" -- a router is 'Valid' if it is running a version of Tor not known to be broken, and the directory authority has not blacklisted it as suspicious.

   "Named" --
   "Unnamed" -- Directory authorities no longer assign these flags.
      They were once used to determine whether a relay's nickname was
      canonically linked to its public key.

"Running" -- A router is 'Running' if the authority managed to connect to it successfully within the last 45 minutes on all its published ORPorts. Authorities check reachability on:

     * the IPv4 ORPort in the "r" line, and
     * the IPv6 ORPort considered for the "a" line, if:
       * the router advertises at least one IPv6 ORPort, and
       * AuthDirHasIPv6Connectivity 1 is set on the authority.

A minority of voting authorities that set AuthDirHasIPv6Connectivity will drop unreachable IPv6 ORPorts from the full consensus. Consensus method 27 in 0.3.3.x puts IPv6 ORPorts in the microdesc consensus, so that authorities can drop unreachable IPv6 ORPorts from all consensus flavors. Consensus method 28 removes IPv6 ORPorts from microdescriptors.

"Stable" -- A router is 'Stable' if it is active, and either its Weighted MTBF is at least the median for known active routers or its Weighted MTBF corresponds to at least 7 days. Routers are never called Stable if they are running a version of Tor known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc are stupid this way.)

To calculate weighted MTBF, compute the weighted mean of the lengths of all intervals when the router was observed to be up, weighting intervals by $\alpha^n$, where $n$ is the amount of time that has passed since the interval ended, and $\alpha$ is chosen so that measurements over approximately one month old no longer influence the weighted MTBF much.

[XXXX what happens when we have less than 4 days of MTBF info.]

"Exit" -- A router is called an 'Exit' iff it allows exits to at least one /8 address space on each of ports 80 and 443. (Up until Tor version 0.3.2, the flag was assigned if relays exit to at least two of the ports 80, 443, and 6667.)

"Fast" -- A router is 'Fast' if it is active, and its bandwidth is either in the top 7/8ths for known active routers or at least 100KB/s.

"Guard" -- A router is a possible Guard if all of the following apply:

       - It is Fast,
       - It is Stable,
       - Its Weighted Fractional Uptime is at least the median for "familiar"
         active routers,
       - It is "familiar",
       - Its bandwidth is at least AuthDirGuardBWGuarantee (if set, 2 MB by
         default), OR its bandwidth is among the 25% fastest relays,
       - It qualifies for the V2Dir flag as described below (this
         constraint was added in 0.3.3.x, because in 0.3.0.x clients
         started avoiding guards that didn't also have the V2Dir flag).

To calculate weighted fractional uptime, compute the fraction of time that the router is up in any given day, weighting so that downtime and uptime in the past counts less.

A node is 'familiar' if 1/8 of all active nodes have appeared more recently than it, OR it has been around for a few weeks.

"Authority" -- A router is called an 'Authority' if the authority generating the network-status document believes it is an authority.

"V2Dir" -- A router supports the v2 directory protocol or higher if it has an open directory port OR a tunnelled-dir-server line in its router descriptor, and it is running a version of the directory protocol that supports the functionality clients need. (Currently, every supported version of Tor supports the functionality that clients need, but some relays might set "DirCache 0" or set really low rate limiting, making them unqualified to be a directory mirror, i.e. they will omit the tunnelled-dir-server line from their descriptor.)

"HSDir" -- A router is a v2 hidden service directory if it stores and serves v2 hidden service descriptors, has the Stable and Fast flag, and the authority believes that it's been up for at least 96 hours (or the current value of MinUptimeHidServDirectoryV2).

"MiddleOnly" -- An authority should vote for this flag if it believes that a relay is unsuitable for use except as a middle relay. When voting for this flag, the authority should also vote against "Exit", "Guard", "HsDir", and "V2Dir". When voting for this flag, if the authority votes on the "BadExit" flag, the authority should vote in favor of "BadExit". (This flag was added in 0.4.7.2-alpha.)

"NoEdConsensus" -- authorities should not vote on this flag; it is produced as part of the consensus for consensus method 22 or later.

"StaleDesc" -- authorities should vote to assign this flag if the published time on the descriptor is over 18 hours in the past. (This flag was added in 0.4.0.1-alpha.)

"Sybil" -- authorities SHOULD NOT accept more than 2 relays on a single IP. If this happens, the authority should vote for the excess relays, but should omit the Running or Valid flags and instead should assign the "Sybil" flag. When there are more than 2 (or AuthDirMaxServersPerAddr) relays to choose from, authorities should first prefer authorities to non-authorities, then prefer Running to non-Running, and then prefer high-bandwidth to low-bandwidth relays. In this comparison, measured bandwidth is used unless it is not present for a router, in which case advertised bandwidth is used.

Thus, the network-status vote includes all non-blacklisted, non-expired, non-superseded descriptors.

The bandwidth in a "w" line should be taken as the best estimate of the router's actual capacity that the authority has. For now, this should be the lesser of the observed bandwidth and bandwidth rate limit from the server descriptor. It is given in kilobytes per second, and capped at some arbitrary value (currently 10 MB/s).

The Measured= keyword on a "w" line vote is currently computed by multiplying the previous published consensus bandwidth by the ratio of the measured average node stream capacity to the network average. If 3 or more authorities provide a Measured= keyword for a router, the authorities produce a consensus containing a "w" Bandwidth= keyword equal to the median of the Measured= votes.

As a special case, if the "w" line in a vote is about a relay with the Authority flag, it should not include a Measured= keyword. The goal is to leave such relays marked as Unmeasured, so they can reserve their attention for authority-specific activities. "w" lines for votes about authorities may include the bandwidth authority's measurement using a different keyword, e.g. MeasuredButAuthority=, so it can still be reported and recorded for posterity.

The ports listed in a "p" line should be taken as those ports for which the router's exit policy permits 'most' addresses, ignoring any accept not for all addresses, ignoring all rejects for private netblocks. "Most" addresses are permitted if no more than 2^25 IPv4 addresses (two /8 networks) were blocked. The list is encoded as described in section 3.8.2.

Serving bandwidth list files

If an authority has used a bandwidth list file to generate a vote document it SHOULD make it available at

http:///tor/status-vote/next/bandwidth.z

at the start of each voting period.

It MUST NOT attempt to send its bandwidth list file in a HTTP POST to other authorities and it SHOULD NOT make bandwidth list files from other authorities available.

If an authority makes this file available, it MUST be the bandwidth file used to create the vote document available at

http:///tor/status-vote/next/authority.z

To avoid inconsistent reads, authorities SHOULD only read the bandwidth file once per voting period. Further processing and serving SHOULD use a cached copy.

The bandwidth list format is described in bandwidth-file-spec.txt.

The standard URLs for bandwidth list files first-appeared in Tor 0.4.0.4-alpha.

Downloading missing certificates from other directory authorities

XXX when to download certificates.

Downloading server descriptors from other directory authorities

Periodically (currently, every 10 seconds), directory authorities check whether there are any specific descriptors that they do not have and that they are not currently trying to download. Authorities identify them by hash in vote (if publication date is more recent than the descriptor we currently have).

[XXXX need a way to fetch descriptors ahead of the vote? v2 status docs can do that for now.]

If so, the directory authority launches requests to the authorities for these descriptors, such that each authority is only asked for descriptors listed in its most recent vote. If more than one authority lists the descriptor, we choose which to ask at random.

If one of these downloads fails, we do not try to download that descriptor from the authority that failed to serve it again unless we receive a newer network-status (consensus or vote) from that authority that lists the same descriptor.

   Directory authorities must potentially cache multiple descriptors for each
   router. Authorities must not discard any descriptor listed by any recent
   consensus.  If there is enough space to store additional descriptors,
   authorities SHOULD try to hold those which clients are likely to download the
   most.  (Currently, this is judged based on the interval for which each
   descriptor seemed newest.)
[XXXX define recent]

Authorities SHOULD NOT download descriptors for routers that they would immediately reject for reasons listed in section 3.2.

Downloading extra-info documents from other directory authorities

Periodically, an authority checks whether it is missing any extra-info documents: in other words, if it has any server descriptors with an extra-info-digest field that does not match any of the extra-info documents currently held. If so, it downloads whatever extra-info documents are missing. We follow the same splitting and back-off rules as in section 3.6.

Computing a consensus from a set of votes

Given a set of votes, authorities compute the contents of the consensus.

The consensus status, along with as many signatures as the server currently knows (see section 3.10 below), should be available at

http:///tor/status-vote/next/consensus.z

The contents of the consensus document are as follows:

The "valid-after", "valid-until", and "fresh-until" times are taken as the median of the respective values from all the votes.

The times in the "voting-delay" line are taken as the median of the VoteSeconds and DistSeconds times in the votes.

Known-flags is the union of all flags known by any voter.

Entries are given on the "params" line for every keyword on which a majority of authorities (total authorities, not just those participating in this vote) voted on, or if at least three authorities voted for that parameter. The values given are the low-median of all votes on that keyword.

(In consensus methods 7 to 11 inclusive, entries were given on the "params" line for every keyword on which any authority voted, the value given being the low-median of all votes on that keyword.)

    "client-versions" and "server-versions" are sorted in ascending
     order; A version is recommended in the consensus if it is recommended
     by more than half of the voting authorities that included a
     client-versions or server-versions lines in their votes.

With consensus method 19 or later, a package line is generated for a given PACKAGENAME/VERSION pair if at least three authorities list such a package in their votes. (Call these lines the "input" lines for PACKAGENAME.) The consensus will contain every "package" line that is listed verbatim by more than half of the authorities listing a line for the PACKAGENAME/VERSION pair, and no others.

The authority item groups (dir-source, contact, fingerprint, vote-digest) are taken from the votes of the voting authorities. These groups are sorted by the digests of the authorities identity keys, in ascending order. If the consensus method is 3 or later, a dir-source line must be included for every vote with legacy-key entry, using the legacy-key's fingerprint, the voter's ordinary nickname with the string "-legacy" appended, and all other fields as from the original vote's dir-source line.

     A router status entry:
        * is included in the result if some router status entry with the same
          identity is included by more than half of the authorities (total
          authorities, not just those whose votes we have).
          (Consensus method earlier than 21)

        * is included according to the rules in section 3.8.0.1 and
          3.8.0.2 below. (Consensus method 22 or later)

        * For any given RSA identity digest, we include at most
          one router status entry.

        * For any given Ed25519 identity, we include at most one router
          status entry.

        * A router entry has a flag set if that is included by more than half
          of the authorities who care about that flag.

        * Two router entries are "the same" if they have the same
          <descriptor digest, published time, nickname, IP, ports> tuple.
          We choose the tuple for a given router as whichever tuple appears
          for that router in the most votes.  We break ties first in favor of
          the more recently published, then in favor of smaller server
          descriptor digest.

       [
        * The Named flag appears if it is included for this routerstatus by
          _any_ authority, and if all authorities that list it list the same
          nickname. However, if consensus-method 2 or later is in use, and
          any authority calls this identity/nickname pair Unnamed, then
          this routerstatus does not get the Named flag.

        * If consensus-method 2 or later is in use, the Unnamed flag is
          set for a routerstatus if any authorities have voted for a different
          identities to be Named with that nickname, or if any authority
          lists that nickname/ID pair as Unnamed.

          (With consensus-method 1, Unnamed is set like any other flag.)

          [But note that authorities no longer vote for the Named flag,
          and the above two bulletpoints are now irrelevant.]
       ]

        * The version is given as whichever version is listed by the most
          voters, with ties decided in favor of more recent versions.

        * If consensus-method 4 or later is in use, then routers that
          do not have the Running flag are not listed at all.

        * If consensus-method 5 or later is in use, then the "w" line
          is generated using a low-median of the bandwidth values from
          the votes that included "w" lines for this router.

        * If consensus-method 5 or later is in use, then the "p" line
          is taken from the votes that have the same policy summary
          for the descriptor we are listing.  (They should all be the
          same.  If they are not, we pick the most commonly listed
          one, breaking ties in favor of the lexicographically larger
          vote.)  The port list is encoded as specified in section 3.8.2.

        * If consensus-method 6 or later is in use and if 3 or more
          authorities provide a Measured= keyword in their votes for
          a router, the authorities produce a consensus containing a
          Bandwidth= keyword equal to the median of the Measured= votes.

        * If consensus-method 7 or later is in use, the params line is
          included in the output.

        * If the consensus method is under 11, bad exits are considered as
          possible exits when computing bandwidth weights.  Otherwise, if
          method 11 or later is in use, any router that is determined to get
          the BadExit flag doesn't count when we're calculating weights.

        * If consensus method 12 or later is used, only consensus
          parameters that more than half of the total number of
          authorities voted for are included in the consensus.

        [ As of 0.2.6.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 13. ]

        * If consensus method 13 or later is used, microdesc consensuses
          omit any router for which no microdesc was agreed upon.

        * If consensus method 14 or later is used, the ns consensus and
          microdescriptors may include an "a" line for each router, listing
          an IPv6 OR port.

        * If consensus method 15 or later is used, microdescriptors
          include "p6" lines including IPv6 exit policies.

        * If consensus method 16 or later is used, ntor-onion-key
          are included in microdescriptors

        * If consensus method 17 or later is used, authorities impose a
          maximum on the Bandwidth= values that they'll put on a 'w'
          line for any router that doesn't have at least 3 measured
          bandwidth values in votes. They also add an "Unmeasured=1"
          flag to such 'w' lines.

        * If consensus method 18 or later is used, authorities include
          "id" lines in microdescriptors. This method adds RSA ids.

        * If consensus method 19 or later is used, authorities may include
          "package" lines in consensuses.

        * If consensus method 20 or later is used, authorities may include
          GuardFraction information in microdescriptors.

        * If consensus method 21 or later is used, authorities may include
          an "id" line for ed25519 identities in microdescriptors.

        [ As of 0.2.8.2-alpha, authorities no longer advertise or negotiate
          consensus method 21, because it contains bugs. ]

        * If consensus method 22 or later is used, and the votes do not
          produce a majority consensus about a relay's Ed25519 key (see
          3.8.0.1 below), the consensus must include a NoEdConsensus flag on
          the "s" line for every relay whose listed Ed key does not reflect
          consensus.

        * If consensus method 23 or later is used, authorities include
          shared randomness protocol data on their votes and consensus.

        * If consensus-method 24 or later is in use, then routers that
          do not have the Valid flag are not listed at all.

        [ As of 0.3.4.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 25. ]

        * If consensus-method 25 or later is in use, then we vote
          on recommended-protocols and required-protocols lines in the
          consensus.  We also include protocols lines in routerstatus
          entries.

        * If consensus-method 26 or later is in use, then we initialize
          bandwidth weights to 1 in our calculations, to avoid
          division-by-zero errors on unusual networks.

        * If consensus method 27 or later is used, the microdesc consensus
          may include an "a" line for each router, listing an IPv6 OR port.

        [ As of 0.4.3.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 28. ]

        * If consensus method 28 or later is used, microdescriptors no longer
          include "a" lines.

        * If consensus method 29 or later is used, microdescriptor "family"
          lines are canonicalized to improve compression.

        * If consensus method 30 or later is used, the base64 encoded
          ntor-onion-key does not include the trailing = sign.

        * If consensus method 31 or later is used, authorities parse the
          "bwweightscale" and "maxunmeasuredbw" parameters correctly when
          computing votes.

        * If consensus method 32 or later is used, authorities handle the
          "MiddleOnly" flag specially when computing a consensus.  When the
          voters agree to include "MiddleOnly" in a routerstatus, they
          automatically remove "Exit", "Guard", "V2Dir", and "HSDir".  If
          the BadExit flag is included in the consensus, they automatically
          add it to the routerstatus.

        * If consensus method 33 or later is used, and the consensus
          flavor is "microdesc", then the "Publication" field in the "r"
          line is set to "2038-01-01 00:00:00".

The signatures at the end of a consensus document are sorted in ascending order by identity digest.

All ties in computing medians are broken in favor of the smaller or earlier item.

Deciding which Ids to include.

This sorting algorithm is used for consensus-method 22 and later.

  First, consider each listing by tuple of <Ed,Rsa> identities, where 'Ed'
    may be "None" if the voter included "id ed25519 none" to indicate that
    the authority knows what ed25519 identities are, and thinks that the RSA
    key doesn't have one.

  For each such <Ed, RSA> tuple that is listed by more than half of the
    total authorities (not just total votes), include it.  (It is not
    possible for any other <id-Ed, id-RSA'> to have as many votes.)  If more
    than half of the authorities list a single <Ed,Rsa> pair of this type, we
    consider that Ed key to be "consensus"; see description of the
    NoEdConsensus flag.

  Log any other id-RSA values corresponding to an id-Ed we included, and any
    other id-Ed values corresponding to an id-RSA we included.

  For each <id-RSA> that is not yet included, if it is listed by more than
    half of the total authorities, and we do not already have it listed with
    some <id-Ed>, include it, but do not consider its Ed identity canonical.

Deciding which descriptors to include

Deciding which descriptors to include.

A tuple belongs to an <id-RSA, id-Ed> identity if it is a new tuple that matches both ID parts, or if it is an old tuple (one with no Ed opinion) that matches the RSA part. A tuple belongs to an identity if its RSA identity matches.

A tuple matches another tuple if all the fields that are present in both tuples are the same.

For every included identity, consider the tuples belonging to that identity. Group them into sets of matching tuples. Include the tuple that matches the largest set, breaking ties in favor of the most recently published, and then in favor of the smaller server descriptor digest.

Forward compatibility

Future versions of Tor will need to include new information in the consensus documents, but it is important that all authorities (or at least half) generate and sign the same signed consensus.

To achieve this, authorities list in their votes their supported methods for generating consensuses from votes. Later methods will be assigned higher numbers. Currently specified methods:

     "1" -- The first implemented version.
     "2" -- Added support for the Unnamed flag.
     "3" -- Added legacy ID key support to aid in authority ID key rollovers
     "4" -- No longer list routers that are not running in the consensus
     "5" -- adds support for "w" and "p" lines.
     "6" -- Prefers measured bandwidth values rather than advertised
     "7" -- Provides keyword=integer pairs of consensus parameters
     "8" -- Provides microdescriptor summaries
     "9" -- Provides weights for selecting flagged routers in paths
     "10" -- Fixes edge case bugs in router flag selection weights
     "11" -- Don't consider BadExits when calculating bandwidth weights
     "12" -- Params are only included if enough auths voted for them
     "13" -- Omit router entries with missing microdescriptors.
     "14" -- Adds support for "a" lines in ns consensuses and microdescriptors.
     "15" -- Adds support for "p6" lines.
     "16" -- Adds ntor keys to microdescriptors
     "17" -- Adds "Unmeasured=1" flags to "w" lines
     "18" -- Adds 'id' to microdescriptors.
     "19" -- Adds "package" lines to consensuses
     "20" -- Adds GuardFraction information to microdescriptors.
     "21" -- Adds Ed25519 keys to microdescriptors.
     "22" -- Instantiates Ed25519 voting algorithm correctly.
     "23" -- Adds shared randomness protocol data.
     "24" -- No longer lists routers that are not Valid in the consensus.
     "25" -- Vote on recommended-protocols and required-protocols.
     "26" -- Initialize bandwidth weights to 1 to avoid division-by-zero.
     "27" -- Adds support for "a" lines in microdescriptor consensuses.
     "28" -- Removes "a" lines from microdescriptors.
     "29" -- Canonicalizes families in microdescriptors.
     "30" -- Removes padding from ntor-onion-key.
     "31" -- Uses correct parsing for bwweightscale and maxunmeasuredbw
             when computing weights

Before generating a consensus, an authority must decide which consensus method to use. To do this, it looks for the highest version number supported by more than 2/3 of the authorities voting. If it supports this method, then it uses it. Otherwise, it falls back to the newest consensus method that it supports (which will probably not result in a sufficiently signed consensus).

All authorities MUST support method 25; authorities SHOULD support more recent methods as well. Authorities SHOULD NOT support or advertise support for any method before 25. Clients MAY assume that they will never see a current valid signed consensus for any method before method 25.

(The consensuses generated by new methods must be parsable by implementations that only understand the old methods, and must not cause those implementations to compromise their anonymity. This is a means for making changes in the contents of consensus; not for making backward-incompatible changes in their format.)

The following methods have incorrect implementations; authorities SHOULD NOT advertise support for them:

"21" -- Did not correctly enable support for ed25519 key collation.

Encoding port lists

Whether the summary shows the list of accepted ports or the list of rejected ports depends on which list is shorter (has a shorter string representation). In case of ties we choose the list of accepted ports. As an exception to this rule an allow-all policy is represented as "accept 1-65535" instead of "reject " and a reject-all policy is similarly given as "reject 1-65535".

Summary items are compressed, that is instead of "80-88,89-100" there only is a single item of "80-100", similarly instead of "20,21" a summary will say "20-21".

Port lists are sorted in ascending order.

The maximum allowed length of a policy summary (including the "accept " or "reject ") is 1000 characters. If a summary exceeds that length we use an accept-style summary and list as much of the port list as is possible within these 1000 bytes. [XXXX be more specific.]

Computing Bandwidth Weights

Let weight_scale = 10000, or the value of the "bwweightscale" parameter. (Before consensus method 31 there was a bug in parsing bwweightscale, so that if there were any consensus parameters after it alphabetically, it would always be treated as 10000. A similar bug existed for "maxunmeasuredbw".)

Starting with consensus method 26, G, M, E, and D are initialized to 1 and T to 4. Prior consensus methods initialize them all to 0. With this change, test tor networks that are small or new are much more likely to produce bandwidth-weights in their consensus. The extra bandwidth has a negligible impact on the bandwidth weights in the public tor network.

Let G be the total bandwidth for Guard-flagged nodes. Let M be the total bandwidth for non-flagged nodes. Let E be the total bandwidth for Exit-flagged nodes. Let D be the total bandwidth for Guard+Exit-flagged nodes. Let T = G+M+E+D

Let Wgd be the weight for choosing a Guard+Exit for the guard position. Let Wmd be the weight for choosing a Guard+Exit for the middle position. Let Wed be the weight for choosing a Guard+Exit for the exit position.

Let Wme be the weight for choosing an Exit for the middle position. Let Wmg be the weight for choosing a Guard for the middle position.

Let Wgg be the weight for choosing a Guard for the guard position. Let Wee be the weight for choosing an Exit for the exit position.

Balanced network conditions then arise from solutions to the following system of equations:

WggG + WgdD == M + WmdD + WmeE + WmgG (guard bw = middle bw) WggG + WgdD == WeeE + WedD (guard bw = exit bw) WedD + WmdD + WgdD == D (aka: Wed+Wmd+Wdg = weight_scale) WmgG + WggG == G (aka: Wgg = weight_scale-Wmg) WmeE + WeeE == E (aka: Wee = weight_scale-Wme)

We are short 2 constraints with the above set. The remaining constraints come from examining different cases of network load. The following constraints are used in consensus method 10 and above. There are another incorrect and obsolete set of constraints used for these same cases in consensus method 9. For those, see dir-spec.txt in Tor 0.2.2.10-alpha to 0.2.2.16-alpha.

Case 1: E >= T/3 && G >= T/3 (Neither Exit nor Guard Scarce)

In this case, the additional two constraints are: Wmg == Wmd, Wed == 1/3.

    This leads to the solution:
        Wgd = weight_scale/3
        Wed = weight_scale/3
        Wmd = weight_scale/3
        Wee = (weight_scale*(E+G+M))/(3*E)
        Wme = weight_scale - Wee
        Wmg = (weight_scale*(2*G-E-M))/(3*G)
        Wgg = weight_scale - Wmg

  Case 2: E < T/3 && G < T/3 (Both are scarce)

Let R denote the more scarce class (Rare) between Guard vs Exit. Let S denote the less scarce class.

Subcase a: R+D < S

In this subcase, we simply devote all of D bandwidth to the scarce class.

       Wgg = Wee = weight_scale
       Wmg = Wme = Wmd = 0;
       if E < G:
         Wed = weight_scale
         Wgd = 0
       else:
         Wed = 0
         Wgd = weight_scale

    Subcase b: R+D >= S

In this case, if M <= T/3, we have enough bandwidth to try to achieve a balancing condition.

Add constraints Wgg = weight_scale, Wmd == Wgd to maximize bandwidth in the guard position while still allowing exits to be used as middle nodes:

Wee = (weight_scale*(E - G + M))/E Wed = (weight_scale*(D - 2E + 4G - 2M))/(3D) Wme = (weight_scale*(G-M))/E Wmg = 0 Wgg = weight_scale Wmd = (weight_scale - Wed)/2 Wgd = (weight_scale - Wed)/2

If this system ends up with any values out of range (ie negative, or above weight_scale), use the constraints Wgg == weight_scale and Wee == weight_scale, since both those positions are scarce:

         Wgg = weight_scale
         Wee = weight_scale
         Wed = (weight_scale*(D - 2*E + G + M))/(3*D)
         Wmd = (weight_Scale*(D - 2*M + G + E))/(3*D)
         Wme = 0
         Wmg = 0
         Wgd = weight_scale - Wed - Wmd

      If M > T/3, then the Wmd weight above will become negative. Set it to 0
      in this case:
         Wmd = 0
         Wgd = weight_scale - Wed

  Case 3: One of E < T/3 or G < T/3

    Let S be the scarce class (of E or G).

    Subcase a: (S+D) < T/3:
      if G=S:
        Wgg = Wgd = weight_scale;
        Wmd = Wed = Wmg = 0;
        // Minor subcase, if E is more scarce than M,
        // keep its bandwidth in place.
        if (E < M) Wme = 0;
        else Wme = (weight_scale*(E-M))/(2*E);
        Wee = weight_scale-Wme;
      if E=S:
        Wee = Wed = weight_scale;
        Wmd = Wgd = Wme = 0;
        // Minor subcase, if G is more scarce than M,
        // keep its bandwidth in place.
        if (G < M) Wmg = 0;
        else Wmg = (weight_scale*(G-M))/(2*G);
        Wgg = weight_scale-Wmg;

    Subcase b: (S+D) >= T/3
      if G=S:
        Add constraints Wgg = weight_scale, Wmd == Wed to maximize bandwidth
        in the guard position, while still allowing exits to be
        used as middle nodes:
          Wgg = weight_scale
          Wgd = (weight_scale*(D - 2*G + E + M))/(3*D)
          Wmg = 0
          Wee = (weight_scale*(E+M))/(2*E)
          Wme = weight_scale - Wee
          Wmd = (weight_scale - Wgd)/2
          Wed = (weight_scale - Wgd)/2
      if E=S:
        Add constraints Wee == weight_scale, Wmd == Wgd to maximize bandwidth
        in the exit position:
          Wee = weight_scale;
          Wed = (weight_scale*(D - 2*E + G + M))/(3*D);
          Wme = 0;
          Wgg = (weight_scale*(G+M))/(2*G);
          Wmg = weight_scale - Wgg;
          Wmd = (weight_scale - Wed)/2;
          Wgd = (weight_scale - Wed)/2;

To ensure consensus, all calculations are performed using integer math with a fixed precision determined by the bwweightscale consensus parameter (defaults at 10000, Min: 1, Max: INT32_MAX). (See note above about parsing bug in bwweightscale before consensus method 31.)

For future balancing improvements, Tor clients support 11 additional weights for directory requests and middle weighting. These weights are currently set at weight_scale, with the exception of the following groups of assignments:

Directory requests use middle weights:

Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm

Handle bridges and strange exit policies:

Wgm=Wgg, Wem=Wee, Weg=Wed

Computing consensus flavors

Consensus flavors are variants of the consensus that clients can choose to download and use instead of the unflavored consensus. The purpose of a consensus flavor is to remove or replace information in the unflavored consensus without forcing clients to download information they would not use anyway.

Directory authorities can produce and serve an arbitrary number of flavors of the same consensus. A downside of creating too many new flavors is that clients will be distinguishable based on which flavor they download. A new flavor should not be created when adding a field instead wouldn't be too onerous.

Examples for consensus flavors include:

      - Publishing hashes of microdescriptors instead of hashes of
        full descriptors (see section 3.9.2).
      - Including different digests of descriptors, instead of the
        perhaps-soon-to-be-totally-broken SHA1.

Consensus flavors are derived from the unflavored consensus once the voting process is complete. This is to avoid consensus synchronization problems.

Every consensus flavor has a name consisting of a sequence of one or more alphanumeric characters and dashes. For compatibility, the original (unflavored) consensus type is called "ns".

The supported consensus flavors are defined as part of the authorities' consensus method.

All consensus flavors have in common that their first line is "network-status-version" where version is 3 or higher, and the flavor is a string consisting of alphanumeric characters and dashes:

"network-status-version" SP version [SP flavor] NL

ns consensus

The ns consensus flavor is equivalent to the unflavored consensus. When the flavor is omitted from the "network-status-version" line, it should be assumed to be "ns". Some implementations may explicitly state that the flavor is "ns" when generating consensuses, but should accept consensuses where the flavor is omitted.

Microdescriptor consensus

The microdescriptor consensus is a consensus flavor that contains microdescriptor hashes instead of descriptor hashes and that omits exit-policy summaries which are contained in microdescriptors. The microdescriptor consensus was designed to contain elements that are small and frequently changing. Clients use the information in the microdescriptor consensus to decide which servers to fetch information about and which servers to fetch information from.

The microdescriptor consensus is based on the unflavored consensus with the exceptions as follows:

"network-status-version" SP version SP "microdesc" NL

[At start, exactly once.]

The flavor name of a microdescriptor consensus is "microdesc".

Changes to router status entries are as follows:

    "r" SP nickname SP identity SP publication SP IP SP ORPort
        SP DirPort NL

        [At start, exactly once.]

        Similar to "r" lines in section 3.4.1, but without the digest element.

    "a" SP address ":" port NL

        [Any number]

        Identical to the "r" lines in section 3.4.1.

(Only included when the vote is generated with consensus-method 14 or later, and the consensus is generated with consensus-method 27 or later.)

"p" ... NL

[At most once]

Not currently generated.

Exit policy summaries are contained in microdescriptors and therefore omitted in the microdescriptor consensus.

"m" SP digest NL

[Exactly once.*]

"digest" is the base64 of the SHA256 hash of the router's microdescriptor with trailing =s omitted. For a given router descriptor digest and consensus method there should only be a single microdescriptor digest in the "m" lines of all votes. If different votes have different microdescriptor digests for the same descriptor digest and consensus method, at least one of the authorities is broken. If this happens, the microdesc consensus should contain whichever microdescriptor digest is most common. If there is no winner, we break ties in the favor of the lexically earliest.

[*Before consensus method 13, this field was sometimes erroneously omitted.]

Additionally, a microdescriptor consensus SHOULD use the sha256 digest algorithm for its signatures.

Exchanging detached signatures

Once an authority has computed and signed a consensus network status, it should send its detached signature to each other authority in an HTTP POST request to the URL:

http:///tor/post/consensus-signature

[XXX Note why we support push-and-then-pull.]

All of the detached signatures it knows for consensus status should be available at:

http:///tor/status-vote/next/consensus-signatures.z

Assuming full connectivity, every authority should compute and sign the same consensus including any flavors in each period. Therefore, it isn't necessary to download the consensus or any flavors of it computed by each authority; instead, the authorities only push/fetch each others' signatures. A "detached signature" document contains items as follows:

"consensus-digest" SP Digest NL

[At start, at most once.]

The digest of the consensus being signed.

"valid-after" SP YYYY-MM-DD SP HH:MM:SS NL "fresh-until" SP YYYY-MM-DD SP HH:MM:SS NL "valid-until" SP YYYY-MM-DD SP HH:MM:SS NL

[As in the consensus]

"additional-digest" SP flavor SP algname SP digest NL

[Any number.]

For each supported consensus flavor, every directory authority adds one or more "additional-digest" lines. "flavor" is the name of the consensus flavor, "algname" is the name of the hash algorithm that is used to generate the digest, and "digest" is the hex-encoded digest.

The hash algorithm for the microdescriptor consensus flavor is defined as SHA256 with algname "sha256".

    "additional-signature" SP flavor SP algname SP identity SP
         signing-key-digest NL signature.

        [Any number.]

For each supported consensus flavor and defined digest algorithm, every directory authority adds an "additional-signature" line. "flavor" is the name of the consensus flavor. "algname" is the name of the algorithm that was used to hash the identity and signing keys, and to compute the signature. "identity" is the hex-encoded digest of the authority identity key of the signing authority, and "signing-key-digest" is the hex-encoded digest of the current authority signing key of the signing authority.

The "sha256" signature format is defined as the RSA signature of the OAEP+-padded SHA256 digest of the item to be signed. When checking signatures, the signature MUST be treated as valid if the signature material begins with SHA256(document), so that other data can get added later. [To be honest, I didn't fully understand the previous paragraph and only copied it from the proposals. Review carefully. -KL]

"directory-signature"

[As in the consensus; the signature object is the same as in the consensus document.]

Publishing the signed consensus

The voting period ends at the valid-after time. If the consensus has been signed by a majority of authorities, these documents are made available at

http:///tor/status-vote/current/consensus.z

and

http:///tor/status-vote/current/consensus-signatures.z

   [XXX current/consensus-signatures is not currently implemented, as it
    is not used in the voting protocol.]

   [XXX possible future features include support for downloading old
    consensuses.]

   The other vote documents are analogously made available under

http:///tor/status-vote/current/authority.z http:///tor/status-vote/current/.z http:///tor/status-vote/current/d/.z http:///tor/status-vote/current/bandwidth.z

once the voting period ends, regardless of the number of signatures.

The authorities serve another consensus of each flavor "F" from the locations

/tor/status-vote/(current|next)/consensus-F.z. and /tor/status-vote/(current|next)/consensus-F/+....z.

The standard URLs for bandwidth list files first-appeared in Tor 0.3.5.

Directory cache operation

All directory caches implement this section, except as noted.

Downloading consensus status documents from directory authorities

All directory caches try to keep a recent network-status consensus document to serve to clients. A cache ALWAYS downloads a network-status consensus if any of the following are true:

The cache has no consensus document.
The cache's consensus document is no longer valid.

Otherwise, the cache downloads a new consensus document at a randomly chosen time in the first half-interval after its current consensus stops being fresh. (This time is chosen at random to avoid swarming the authorities at the start of each period. The interval size is inferred from the difference between the valid-after time and the fresh-until time on the consensus.)

   [For example, if a cache has a consensus that became valid at 1:00,
    and is fresh until 2:00, that cache will fetch a new consensus at
    a random time between 2:00 and 2:30.]

Directory caches also fetch consensus flavors from the authorities. Caches check the correctness of consensus flavors, but do not check anything about an unrecognized consensus document beyond its digest and length. Caches serve all consensus flavors from the same locations as the directory authorities.

Downloading server descriptors from directory authorities

Periodically (currently, every 10 seconds), directory caches check whether there are any specific descriptors that they do not have and that they are not currently trying to download. Caches identify these descriptors by hash in the recent network-status consensus documents.

If so, the directory cache launches requests to the authorities for these descriptors.

If one of these downloads fails, we do not try to download that descriptor from the authority that failed to serve it again unless we receive a newer network-status consensus that lists the same descriptor.

Directory caches must potentially cache multiple descriptors for each router. Caches must not discard any descriptor listed by any recent consensus. If there is enough space to store additional descriptors, caches SHOULD try to hold those which clients are likely to download the most. (Currently, this is judged based on the interval for which each descriptor seemed newest.)

[XXXX define recent]

Downloading microdescriptors from directory authorities

Directory mirrors should fetch, cache, and serve each microdescriptor from the authorities.

The microdescriptors with base64 hashes ,, are available at:

http:///tor/micro/d/--[.z]

are base64 encoded with trailing =s omitted for size and for consistency with the microdescriptor consensus format. -s are used instead of +s to separate items, since the + character is used in base64 encoding.

Directory mirrors should check to make sure that the microdescriptors they're about to serve match the right hashes (either the hashes from the fetch URL or the hashes from the consensus, respectively).

(NOTE: Due to squid proxy url limitations at most 92 microdescriptor hashes can be retrieved in a single request.)

Downloading extra-info documents from directory authorities

Any cache that chooses to cache extra-info documents should implement this section.

Periodically, the Tor instance checks whether it is missing any extra-info documents: in other words, if it has any server descriptors with an extra-info-digest field that does not match any of the extra-info documents currently held. If so, it downloads whatever extra-info documents are missing. Caches download from authorities. We follow the same splitting and back-off rules as in section 4.2.

Consensus diffs

Instead of downloading an entire consensus, clients may download a "diff" document containing an ed-style diff from a previous consensus document. Caches (and authorities) make these diffs as they learn about new consensuses. To do so, they must store a record of older consensuses.

(Support for consensus diffs was added in 0.3.1.1-alpha, and is advertised with the DirCache protocol version "2" or later.)

Consensus diff format

Consensus diffs are formatted as follows:

The first line is "network-status-diff-version 1" NL

The second line is

"hash" SP FromDigest SP ToDigest NL

where FromDigest is the hex-encoded SHA3-256 digest of the signed part of the consensus that the diff should be applied to, and ToDigest is the hex-encoded SHA3-256 digest of the entire consensus resulting from applying the diff. (See 3.4.1 for information on that part of a consensus is signed.)

The third and subsequent lines encode the diff from FromDigest to ToDigest in a limited subset of the ed diff format, as specified in appendix E.

Serving and requesting diffs.

When downloading the current consensus, a client may include an HTTP header of the form

X-Or-Diff-From-Consensus: HASH1, HASH2, ...

where the HASH values are hex-encoded SHA3-256 digests of the signed part of one or more consensuses that the client knows about.

If a cache knows a consensus diff from one of those consensuses to the most recent consensus of the requested flavor, it may send that diff instead of the specified consensus.

Caches also serve diffs from the URIs:

/tor/status-vote/current/consensus/diff//.z /tor/status-vote/current/consensus-/diff//.z

where FLAVOR is the consensus flavor, defaulting to "ns", and FPRLIST is +-separated list of recognized authority identity fingerprints as in appendix B.

Retrying failed downloads

See section 5.5 below; it applies to caches as well as clients.

Client operation

Every Tor that is not a directory server (that is, those that do not have a DirPort set) implements this section.

Downloading network-status documents

Each client maintains a list of directory authorities. Insofar as possible, clients SHOULD all use the same list.

  [Newer versions of Tor (0.2.8.1-alpha and later):
   Each client also maintains a list of default fallback directory mirrors
   (fallbacks). Each released version of Tor MAY have a different list,
   depending on the mirrors that satisfy the fallback directory criteria at
   release time.]

Clients try to have a live consensus network-status document at all times. A network-status document is "live" if the time in its valid-after field has passed, and the time in its valid-until field has not passed.

When a client has no consensus network-status document, it downloads it from a randomly chosen fallback directory mirror or authority. Clients prefer fallbacks to authorities, trying them earlier and more frequently. In all other cases, the client downloads from caches randomly chosen from among those believed to be V3 directory servers. (This information comes from the network-status documents.)

After receiving any response client MUST discard any network-status documents that it did not request.

On failure, the client waits briefly, then tries that network-status document again from another cache. The client does not build circuits until it has a live network-status consensus document, and it has descriptors for a significant proportion of the routers that it believes are running (this is configurable using torrc options and consensus parameters).

  [Newer versions of Tor (0.2.6.2-alpha and later):
   If the consensus contains Exits (the typical case), Tor will build both
   exit and internal circuits. When bootstrap completes, Tor will be ready
   to handle an application requesting an exit circuit to services like the
   World Wide Web.

If the consensus does not contain Exits, Tor will only build internal circuits. In this case, earlier statuses will have included "internal" as indicated above. When bootstrap completes, Tor will be ready to handle an application requesting an internal circuit to hidden services at ".onion" addresses.

If a future consensus contains Exits, exit circuits may become available.]

(Note: clients can and should pick caches based on the network-status information they have: once they have first fetched network-status info from an authority or fallback, they should not need to go to the authority directly again, and should only choose the fallback at random, based on its consensus weight in the current consensus.)

To avoid swarming the caches whenever a consensus expires, the clients download new consensuses at a randomly chosen time after the caches are expected to have a fresh consensus, but before their consensus will expire. (This time is chosen uniformly at random from the interval between the time 3/4 into the first interval after the consensus is no longer fresh, and 7/8 of the time remaining after that before the consensus is invalid.)

   [For example, if a client has a consensus that became valid at 1:00,
    and is fresh until 2:00, and expires at 4:00, that client will fetch
    a new consensus at a random time between 2:45 and 3:50, since 3/4
    of the one-hour interval is 45 minutes, and 7/8 of the remaining 75
    minutes is 65 minutes.]

Clients may choose to download the microdescriptor consensus instead of the general network status consensus. In that case they should use the same update strategy as for the normal consensus. They should not download more than one consensus flavor.

When a client does not have a live consensus, it will generally use the most recent consensus it has if that consensus is "reasonably live". A "reasonably live" consensus is one that expired less than 24 hours ago.

Downloading server descriptors or microdescriptors

Clients try to have the best descriptor for each router. A descriptor is "best" if:

It is listed in the consensus network-status document.

Periodically (currently every 10 seconds) clients check whether there are any "downloadable" descriptors. A descriptor is downloadable if:

      - It is the "best" descriptor for some router.
      - The descriptor was published at least 10 minutes in the past.
        (This prevents clients from trying to fetch descriptors that the
        mirrors have probably not yet retrieved and cached.)
      - The client does not currently have it.
      - The client is not currently trying to download it.
      - The client would not discard it immediately upon receiving it.
      - The client thinks it is running and valid (see section 5.4.1 below).

If at least 16 known routers have downloadable descriptors, or if enough time (currently 10 minutes) has passed since the last time the client tried to download descriptors, it launches requests for all downloadable descriptors.

When downloading multiple server descriptors, the client chooses multiple mirrors so that:

     - At least 3 different mirrors are used, except when this would result
       in more than one request for under 4 descriptors.
     - No more than 128 descriptors are requested from a single mirror.
     - Otherwise, as few mirrors as possible are used.
   After choosing mirrors, the client divides the descriptors among them
   randomly.

After receiving any response the client MUST discard any descriptors that it did not request.

When a descriptor download fails, the client notes it, and does not consider the descriptor downloadable again until a certain amount of time has passed. (Currently 0 seconds for the first failure, 60 seconds for the second, 5 minutes for the third, 10 minutes for the fourth, and 1 day thereafter.) Periodically (currently once an hour) clients reset the failure count.

Clients retain the most recent descriptor they have downloaded for each router so long as it is listed in the consensus. If it is not listed, they keep it so long as it is not too old (currently, ROUTER_MAX_AGE=48 hours) and no better router descriptor has been downloaded for the same relay. Caches retain descriptors until they are at least OLD_ROUTER_DESC_MAX_AGE=5 days old.

Clients which chose to download the microdescriptor consensus instead of the general consensus must download the referenced microdescriptors instead of server descriptors. Clients fetch and cache microdescriptors preemptively from dir mirrors when starting up, like they currently fetch descriptors. After bootstrapping, clients only need to fetch the microdescriptors that have changed.

When a client gets a new microdescriptor consensus, it looks to see if there are any microdescriptors it needs to learn, and launches a request for them.

Clients maintain a cache of microdescriptors along with metadata like when it was last referenced by a consensus, and which identity key it corresponds to. They keep a microdescriptor until it hasn't been mentioned in any consensus for a week. Future clients might cache them for longer or shorter times.

Downloading extra-info documents

Any client that uses extra-info documents should implement this section.

Note that generally, clients don't need extra-info documents.

Periodically, the Tor instance checks whether it is missing any extra-info documents: in other words, if it has any server descriptors with an extra-info-digest field that does not match any of the extra-info documents currently held. If so, it downloads whatever extra-info documents are missing. Clients try to download from caches. We follow the same splitting and back-off rules as in section 5.2.

Using directory information

[XXX This subsection really belongs in path-spec.txt, not here. -KL]

Everyone besides directory authorities uses the approaches in this section to decide which relays to use and what their keys are likely to be. (Directory authorities just believe their own opinions, as in section 3.4.2 above.)

Choosing routers for circuits.

Circuits SHOULD NOT be built until the client has enough directory information: a live consensus network status [XXXX fallback?] and descriptors for at least 1/4 of the relays believed to be running.

A relay is "listed" if it is included by the consensus network-status document. Clients SHOULD NOT use unlisted relays.

These flags are used as follows:

     - Clients SHOULD NOT use non-'Valid' or non-'Running' routers unless
       requested to do so.

     - Clients SHOULD NOT use non-'Fast' routers for any purpose other than
       very-low-bandwidth circuits (such as introduction circuits).

     - Clients SHOULD NOT use non-'Stable' routers for circuits that are
       likely to need to be open for a very long time (such as those used for
       IRC or SSH connections).

     - Clients SHOULD NOT choose non-'Guard' nodes when picking entry guard
       nodes.

   See the "path-spec.txt" document for more details.

Managing naming

(This section is removed; authorities no longer assign the 'Named' flag.)

Software versions

An implementation of Tor SHOULD warn when it has fetched a consensus network-status, and it is running a software version not listed.

Warning about a router's status.

(This section is removed; authorities no longer assign the 'Named' flag.)

Retrying failed downloads

This section applies to caches as well as to clients.

When a client fails to download a resource (a consensus, a router descriptor, a microdescriptor, etc) it waits for a certain amount of time before retrying the download. To determine the amount of time to wait, clients use a randomized exponential backoff algorithm. (Specifically, they use a variation of the "decorrelated jitter" algorithm from https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ .)

The specific formula used to compute the 'i+1'th delay is:

        Delay_{i+1} = MIN(cap, random_between(lower_bound, upper_bound)))
          where upper_bound = MAX(lower_bound+1, Delay_i * 3)
                lower_bound = MAX(1, base_delay).

The value of 'cap' is set to INT_MAX; the value of 'base_delay' depends on what is being downloaded, whether the client is fully bootstrapped, how the client is configured, and where it is downloading from. Current base_delay values are:

   Consensus objects, as a non-bridge cache:
         0 (TestingServerConsensusDownloadInitialDelay)

   Consensus objects, as a client or bridge that has bootstrapped:
         0 (TestingClientConsensusDownloadInitialDelay)

   Consensus objects, as a client or bridge that is bootstrapping,
   when connecting to an authority because no "fallback" caches are
   known:
         0 (ClientBootstrapConsensusAuthorityOnlyDownloadInitialDelay)

   Consensus objects, as a client or bridge that is bootstrapping,
   when "fallback" caches are known but connecting to an authority
   anyway:
         6 (ClientBootstrapConsensusAuthorityDownloadInitialDelay)

   Consensus objects, as a client or bridge that is bootstrapping,
   when downloading from a "fallback" cache.
         0 (ClientBootstrapConsensusFallbackDownloadInitialDelay)

   Bridge descriptors, as a bridge-using client when at least one bridge
   is usable:
         10800 (TestingBridgeDownloadInitialDelay)

   Bridge descriptors, otherwise:
         0 (TestingBridgeBootstrapDownloadInitialDelay)

   Other objects, as cache or authority:
         0 (TestingServerDownloadInitialDelay)

   Other objects, as client:
         0 (TestingClientDownloadInitialDelay)

Standards compliance

All clients and servers MUST support HTTP 1.0. Clients and servers MAY support later versions of HTTP as well.

HTTP headers

Servers SHOULD set Content-Encoding to the algorithm used to compress the document(s) being served. Recognized algorithms are:

     - "identity"     -- RFC2616 section 3.5
     - "deflate"      -- RFC2616 section 3.5
     - "gzip"         -- RFC2616 section 3.5
     - "x-zstd"       -- The zstandard compression algorithm (www.zstd.net)
     - "x-tor-lzma"   -- The lzma compression algorithm, with a "preset"
                         value no higher than 6.

Clients SHOULD use Accept-Encoding on most directory requests to indicate which of the above compression algorithms they support. If they omit it (as Tor clients did before 0.3.1.1-alpha), then the server should serve only "deflate" or "identity" encoded documents, based on the presence or absence of the ".z" suffix on the requested URL.

Note that for anonymous directory requests (that is, requests made over multi-hop circuits, like those for onion service lookups) implementations SHOULD NOT advertise any Accept-Encoding values other than deflate. To do so would be to create a fingerprinting opportunity.

When receiving multiple documents, clients MUST accept compressed concatenated documents and concatenated compressed documents as equivalent.

Servers MAY set the Content-Length: header. When they do, it should match the number of compressed bytes that they are sending.

Servers MAY include an X-Your-Address-Is: header, whose value is the apparent IP address of the client connecting to them (as a dotted quad). For directory connections tunneled over a BEGIN_DIR stream, servers SHOULD report the IP from which the circuit carrying the BEGIN_DIR stream reached them.

Servers SHOULD disable caching of multiple network statuses or multiple server descriptors. Servers MAY enable caching of single descriptors, single network statuses, the list of all server descriptors, a v1 directory, or a v1 running routers document. XXX mention times.

HTTP status codes

Tor delivers the following status codes. Some were chosen without much thought; other code SHOULD NOT rely on specific status codes yet.

  200 -- the operation completed successfully
      -- the user requested statuses or serverdescs, and none of the ones we
         requested were found (0.2.0.4-alpha and earlier).

  304 -- the client specified an if-modified-since time, and none of the
         requested resources have changed since that time.

  400 -- the request is malformed, or
      -- the URL is for a malformed variation of one of the URLs we support,
          or
      -- the client tried to post to a non-authority, or
      -- the authority rejected a malformed posted document, or

  404 -- the requested document was not found.
      -- the user requested statuses or serverdescs, and none of the ones
         requested were found (0.2.0.5-alpha and later).

  503 -- we are declining the request in order to save bandwidth
      -- user requested some items that we ordinarily generate or store,
         but we do not have any available.

Consensus-negotiation timeline.

   Period begins: this is the Published time.
     Everybody sends votes
   Reconciliation: everybody tries to fetch missing votes.
     consensus may exist at this point.
   End of voting period:
     everyone swaps signatures.
   Now it's okay for caches to download
     Now it's okay for clients to download.

   Valid-after/valid-until switchover

General-use HTTP URLs

"Fingerprints" in these URLs are base16-encoded SHA1 hashes.

The most recent v3 consensus should be available at:

http:///tor/status-vote/current/consensus.z

Similarly, the v3 microdescriptor consensus should be available at:

http:///tor/status-vote/current/consensus-microdesc.z

Starting with Tor version 0.2.1.1-alpha is also available at:

http:///tor/status-vote/current/consensus/++.z

(NOTE: Due to squid proxy url limitations at most 96 fingerprints can be retrieved in a single request.)

Where F1, F2, etc. are authority identity fingerprints the client trusts. Servers will only return a consensus if more than half of the requested authorities have signed the document, otherwise a 404 error will be sent back. The fingerprints can be shortened to a length of any multiple of two, using only the leftmost part of the encoded fingerprint. Tor uses 3 bytes (6 hex characters) of the fingerprint.

Clients SHOULD sort the fingerprints in ascending order. Server MUST accept any order.

Clients SHOULD use this format when requesting consensus documents from directory authority servers and from caches running a version of Tor that is known to support this URL format.

A concatenated set of all the current key certificates should be available at:

http:///tor/keys/all.z

The key certificate for this server should be available at:

http:///tor/keys/authority.z

The key certificate for an authority whose authority identity fingerprint is should be available at:

http:///tor/keys/fp/.z

The key certificate whose signing key fingerprint is should be available at:

http:///tor/keys/sk/.z

The key certificate whose identity key fingerprint is and whose signing key fingerprint is ~~should be available at:~~

http:///tor/keys/fp-sk/-.z

(As usual, clients may request multiple certificates using:

http:///tor/keys/fp-sk/-+-.z )

[The above fp-sk format was not supported before Tor 0.2.1.9-alpha.]

The most recent descriptor for a server whose identity key has a fingerprint of should be available at:

http:///tor/server/fp/.z

The most recent descriptors for servers with identity fingerprints ,, should be available at:

http:///tor/server/fp/++.z

(NOTE: Due to squid proxy url limitations at most 96 fingerprints can be retrieved in a single request.

Implementations SHOULD NOT download descriptors by identity key fingerprint. This allows a corrupted server (in collusion with a cache) to provide a unique descriptor to a client, and thereby partition that client from the rest of the network.)

The server descriptor with (descriptor) digest (in hex) should be available at:

http:///tor/server/d/.z

The most recent descriptors with digests ,, should be available at:

http:///tor/server/d/++.z

The most recent descriptor for this server should be at:

http:///tor/server/authority.z

This is used for authorities, and also if a server is configured as a bridge. The official Tor implementations (starting at 0.1.1.x) use this resource to test whether a server's own DirPort is reachable. It is also useful for debugging purposes.

A concatenated set of the most recent descriptors for all known servers should be available at:

http:///tor/server/all.z

Extra-info documents are available at the URLS

http://<hostname>/tor/extra/d/... http://<hostname>/tor/extra/fp/... http://<hostname>/tor/extra/all[.z] http://<hostname>/tor/extra/authority[.z] (As for /tor/server/ URLs: supports fetching extra-info documents by their digest, by the fingerprint of their servers, or all at once. When serving by fingerprint, we serve the extra-info that corresponds to the descriptor we would serve by that fingerprint. Only directory authorities of version 0.2.0.1-alpha or later are guaranteed to support the first three classes of URLs. Caches may support them, and MUST support them if they have advertised "caches-extra-info".)

For debugging, directories SHOULD expose non-compressed objects at URLs like the above, but without the final ".z". If the client uses Accept-Encodings header, it should override the presence or absence of the ".z" (see section 6.1).

Clients SHOULD use upper case letters (A-F) when base16-encoding fingerprints. Servers MUST accept both upper and lower case fingerprints in requests.

Converting a curve25519 public key to an ed25519 public key

Given an X25519 key, that is, an affine point (u,v) on the Montgomery curve defined by

bv^2 = u(u^2 + au +1)

where

a = 486662 b = 1

and comprised of the compressed form (i.e. consisting of only the u-coordinate), we can retrieve the y-coordinate of the affine point (x,y) on the twisted Edwards form of the curve defined by

-x^2 + y^2 = 1 + d x^2 y^2

where

d = - 121665/121666

by computing

y = (u-1)/(u+1).

and then we can apply the usual curve25519 twisted Edwards point decompression algorithm to find an x-coordinate of an affine twisted Edwards point to check signatures with. Signing keys for ed25519 are compressed curve points in twisted Edwards form (so a y-coordinate and the sign of the x-coordinate), and X25519 keys are compressed curve points in Montgomery form (i.e. a u-coordinate).

However, note that compressed point in Montgomery form neglects to encode what the sign of the corresponding twisted Edwards x-coordinate would be. Thus, we need the sign of the x-coordinate to do this operation; otherwise, we'll have two possible x-coordinates that might have correspond to the ed25519 public key.

To get the sign, the easiest way is to take the corresponding private key, feed it to the ed25519 public key generation algorithm, and see what the sign is.

[Recomputing the sign bit from the private key every time sounds rather strange and inefficient to me… —isis]

Note that in addition to its coordinates, an expanded Ed25519 private key also has a 32-byte random value, "prefix", used to compute internal r values in the signature. For security, this prefix value should be derived deterministically from the curve25519 key. The Tor implementation derives it as SHA512(private_key | STR)[0..32], where STR is the nul-terminated string:

"Derive high part of ed25519 key from curve25519 key\0"

On the client side, where there is no access to the curve25519 private keys, one may use the curve25519 public key's Montgomery u-coordinate to recover the Montgomery v-coordinate by computing the right-hand side of the Montgomery curve equation:

bv^2 = u(u^2 + au +1)

where

a = 486662 b = 1

Then, knowing the intended sign of the Edwards x-coordinate, one may recover said x-coordinate by computing:

x = (u/v) * sqrt(-a - 2)

Inferring missing proto lines.

The directory authorities no longer allow versions of Tor before 0.2.4.18-rc. But right now, there is no version of Tor in the consensus before 0.2.4.19. Therefore, we should disallow versions of Tor earlier than 0.2.4.19, so that we can have the protocol list for all current Tor versions include:

Cons=1-2 Desc=1-2 DirCache=1 HSDir=1 HSIntro=3 HSRend=1-2 Link=1-4 LinkAuth=1 Microdesc=1-2 Relay=1-2

For Desc, Microdesc and Cons, Tor versions before 0.2.7.stable should be taken to only support version 1.

Limited ed diff format

We support the following format for consensus diffs. It's a subset of the ed diff format, but clients MUST NOT accept other ed commands.

We support the following ed commands, each on a line by itself:

- "<n1>d" Delete line n1 - "<n1>,<n2>d" Delete lines n1 through n2, inclusive - "<n1>,$d" Delete line n1 through the end of the file, inclusive. - "<n1>c" Replace line n1 with the following block - "<n1>,<n2>c" Replace lines n1 through n2, inclusive, with the following block. - "<n1>a" Append the following block after line n1.

Note that line numbers always apply to the file after all previous commands have already been applied. Note also that line numbers are 1-indexed.

The commands MUST apply to the file from back to front, such that lines are only ever referred to by their position in the original file.

If there are any directory signatures on the original document, the first command MUST be a ",$d" form to remove all of the directory signatures. Using this format ensures that the client will successfully apply the diff even if they have an unusual encoding for the signatures.

The replace and append command take blocks. These blocks are simply appended to the diff after the line with the command. A line with just a period (".") ends the block (and is not part of the lines to add). Note that it is impossible to insert a line with just a single dot.

Tor Shared Random Subsystem Specification

This document specifies how the commit-and-reveal shared random subsystem of Tor works. This text used to be proposal 250-commit-reveal-consensus.txt.

Table Of Contents:

1. Introduction 1.1. Motivation 1.2. Previous work 2. Overview 2.1. Introduction to our commit-and-reveal protocol 2.2. Ten thousand feet view of the protocol 2.3. How we use the consensus [CONS] 2.3.1. Inserting Shared Random Values in the consensus 2.4. Persistent State of the Protocol [STATE] 2.5. Protocol Illustration 3. Protocol 3.1 Commitment Phase [COMMITMENTPHASE] 3.1.1. Voting During Commitment Phase 3.1.2. Persistent State During Commitment Phase [STATECOMMIT] 3.2 Reveal Phase 3.2.1. Voting During Reveal Phase 3.2.2. Persistent State During Reveal Phase [STATEREVEAL] 3.3. Shared Random Value Calculation At 00:00UTC 3.3.1. Shared Randomness Calculation [SRCALC] 3.4. Bootstrapping Procedure 3.5. Rebooting Directory Authorities [REBOOT] 4. Specification [SPEC] 4.1. Voting 4.1.1. Computing commitments and reveals [COMMITREVEAL] 4.1.2. Validating commitments and reveals [VALIDATEVALUES] 4.1.4. Encoding commit/reveal values in votes [COMMITVOTE] 4.1.5. Shared Random Value [SRVOTE] 4.2. Encoding Shared Random Values in the consensus [SRCONSENSUS] 4.3. Persistent state format [STATEFORMAT] 5. Security Analysis 5.1. Security of commit-and-reveal and future directions 5.2. Predicting the shared random value during reveal phase 5.3. Partition attacks 5.3.1. Partition attacks during commit phase 5.3.2. Partition attacks during reveal phase 6. Discussion 6.1. Why the added complexity from proposal 225? 6.2. Why do you do a commit-and-reveal protocol in 24 rounds? 6.3. Why can't we recover if the 00:00UTC consensus fails? 7. Acknowledgements

Introduction

Motivation

For the next generation hidden services project, we need the Tor network to produce a fresh random value every day in such a way that it cannot be predicted in advance or influenced by an attacker.

Currently we need this random value to make the HSDir hash ring unpredictable (#8244), which should resolve a wide class of hidden service DoS attacks and should make it harder for people to gauge the popularity and activity of target hidden services. Furthermore this random value can be used by other systems in need of fresh global randomness like Tor-related protocols (e.g. OnioNS) or even non-Tor-related (e.g. warrant canaries).

Previous work

Proposal 225 specifies a commit-and-reveal protocol that can be run as an external script and have the results be fed to the directory authorities. However, directory authority operators feel unsafe running a third-party script that opens TCP ports and accepts connections from the Internet. Hence, this proposal aims to embed the commit-and-reveal idea in the Tor voting process which should make it smoother to deploy and maintain.

Overview

This proposal alters the Tor consensus protocol such that a random number is generated every midnight by the directory authorities during the regular voting process. The distributed random generator scheme is based on the commit-and-reveal technique.

The proposal also specifies how the final shared random value is embedded in consensus documents so that clients who need it can get it.

Introduction to our commit-and-reveal protocol

Every day, before voting for the consensus at 00:00UTC each authority generates a new random value and keeps it for the whole day. The authority cryptographically hashes the random value and calls the output its "commitment" value. The original random value is called the "reveal" value.

The idea is that given a reveal value you can cryptographically confirm that it corresponds to a given commitment value (by hashing it). However given a commitment value you should not be able to derive the underlying reveal value. The construction of these values is specified in section [COMMITREVEAL].

Ten thousand feet view of the protocol

Our commit-and-reveal protocol aims to produce a fresh shared random value (denoted shared_random_value here and elsewhere) every day at 00:00UTC. The final fresh random value is embedded in the consensus document at that time.

Our protocol has two phases and uses the hourly voting procedure of Tor. Each phase lasts 12 hours, which means that 12 voting rounds happen in between. In short, the protocol works as follows:

Commit phase:

Starting at 00:00UTC and for a period of 12 hours, authorities every hour include their commitment in their votes. They also include any received commitments from other authorities, if available. Reveal phase: At 12:00UTC, the reveal phase starts and lasts till the end of the protocol at 00:00UTC. In this stage, authorities must reveal the value they committed to in the previous phase. The commitment and revealed values from other authorities, when available, are also added to the vote. Shared Randomness Calculation: At 00:00UTC, the shared random value is computed from the agreed revealed values and added to the consensus. This concludes the commit-and-reveal protocol every day at 00:00UTC.

How we use the consensus [CONS]

The produced shared random values need to be readily available to clients. For this reason we include them in the consensus documents.

Every hour the consensus documents need to include the shared random value of the day, as well as the shared random value of the previous day. That's because either of these values might be needed at a given time for a Tor client to access a hidden service according to section [TIME-OVERLAP] of proposal 224. This means that both of these two values need to be included in votes as well.

Hence, consensuses need to include:

(a) The shared random value of the current time period. (b) The shared random value of the previous time period.

For this, a new SR consensus method will be needed to indicate which authorities support this new protocol.

Inserting Shared Random Values in the consensus

After voting happens, we need to be careful on how we pick which shared random values (SRV) to put in the consensus, to avoid breaking the consensus because of authorities having different views of the commit-and-reveal protocol (because maybe they missed some rounds of the protocol).

For this reason, authorities look at the received votes before creating a consensus and employ the following logic:

- First of all, they make sure that the agreed upon consensus method is above the SR consensus method. - Authorities include an SRV in the consensus if and only if the SRV has been voted by at least the majority of authorities. - For the consensus at 00:00UTC, authorities include an SRV in the consensus if and only if the SRV has been voted by at least AuthDirNumAgreements authorities (where AuthDirNumAgreements is a newly introduced consensus parameter).

Authorities include in the consensus the most popular SRV that also satisfies the above constraints. Otherwise, no SRV should be included.

The above logic is used to make it harder to break the consensus by natural partioning causes.

We use the AuthDirNumAgreements consensus parameter to enforce that a supermajority of dirauths supports the SR protocol during SRV creation, so that even if a few of those dirauths drop offline in the middle of the run the SR protocol does not get disturbed. We go to extra lengths to ensure this because changing SRVs in the middle of the day has terrible reachability consequences for hidden service clients.

Persistent State of the Protocol [STATE]

A directory authority needs to keep a persistent state on disk of the on going protocol run. This allows an authority to join the protocol seamlessly in the case of a reboot.

During the commitment phase, it is populated with the commitments of all authorities. Then during the reveal phase, the reveal values are also stored in the state.

As discussed previously, the shared random values from the current and previous time period must also be present in the state at all times if they are available.

Protocol Illustration

An illustration for better understanding the protocol can be found here:

https://people.torproject.org/~asn/hs_notes/shared_rand.jpg

It reads left-to-right.

The illustration displays what the authorities (A_1, A_2, A_3) put in their votes. A chain 'A_1 -> c_1 -> r_1' denotes that authority A_1 committed to the value c_1 which corresponds to the reveal value r_1.

The illustration depicts only a few rounds of the whole protocol. It starts with the first three rounds of the commit phase, then it jumps to the last round of the commit phase. It continues with the first two rounds of the reveal phase and then it jumps to the final round of the protocol run. It finally shows the first round of the commit phase of the next protocol run (00:00UTC) where the final Shared Random Value is computed. In our fictional example, the SRV was computed with 3 authority contributions and its value is "a56fg39h".

We advice you to revisit this after you have read the whole document.

Protocol

In this section we give a detailed specification of the protocol. We describe the protocol participants' logic and the messages they send. The encoding of the messages is specified in the next section ([SPEC]).

Now we go through the phases of the protocol:

Commitment Phase [COMMITMENTPHASE]

The commit phase lasts from 00:00UTC to 12:00UTC.

During this phase, an authority commits a value in its vote and saves it to the permanent state as well.

Authorities also save any received authoritative commits by other authorities in their permanent state. We call a commit by Alice "authoritative" if it was included in Alice's vote.

Voting During Commitment Phase

During the commit phase, each authority includes in its votes:

- The commitment value for this protocol run. - Any authoritative commitments received from other authorities. - The two previous shared random values produced by the protocol (if any).

The commit phase lasts for 12 hours, so authorities have multiple chances to commit their values. An authority MUST NOT commit a second value during a subsequent round of the commit phase.

If an authority publishes a second commitment value in the same commit phase, only the first commitment should be taken in account by other authorities. Any subsequent commitments MUST be ignored.

Persistent State During Commitment Phase [STATECOMMIT]

During the commitment phase, authorities save in their persistent state the authoritative commits they have received from each authority. Only one commit per authority must be considered trusted and active at a given time.

Reveal Phase

The reveal phase lasts from 12:00UTC to 00:00UTC.

Now that the commitments have been agreed on, it's time for authorities to reveal their random values.

Voting During Reveal Phase

During the reveal phase, each authority includes in its votes:

- Its reveal value that was previously committed in the commit phase. - All the commitments and reveals received from other authorities. - The two previous shared random values produced by the protocol (if any).

The set of commitments have been decided during the commitment phase and must remain the same. If an authority tries to change its commitment during the reveal phase or introduce a new commitment, the new commitment MUST be ignored.

Persistent State During Reveal Phase [STATEREVEAL]

During the reveal phase, authorities keep the authoritative commits from the commit phase in their persistent state. They also save any received reveals that correspond to authoritative commits and are valid (as specified in [VALIDATEVALUES]).

An authority that just received a reveal value from another authority's vote, MUST wait till the next voting round before including that reveal value in its votes.

Shared Random Value Calculation At 00:00UTC

Finally, at 00:00UTC every day, authorities compute a fresh shared random value and this value must be added to the consensus so clients can use it.

Authorities calculate the shared random value using the reveal values in their state as specified in subsection [SRCALC].

Authorities at 00:00UTC start including this new shared random value in their votes, replacing the one from two protocol runs ago. Authorities also start including this new shared random value in the consensus as well.

Apart from that, authorities at 00:00UTC proceed voting normally as they would in the first round of the commitment phase (section [COMMITMENTPHASE]).

Shared Randomness Calculation [SRCALC]

An authority that wants to derive the shared random value SRV, should use the appropriate reveal values for that time period and calculate SRV as follows.

HASHED_REVEALS = H(ID_a | R_a | ID_b | R_b | ..)

SRV = SHA3-256("shared-random" | INT_8(REVEAL_NUM) | INT_4(VERSION) | HASHED_REVEALS | PREVIOUS_SRV)

where the ID_a value is the identity key fingerprint of authority 'a' and R_a is the corresponding reveal value of that authority for the current period.

Also, REVEAL_NUM is the number of revealed values in this construction, VERSION is the protocol version number and PREVIOUS_SRV is the previous shared random value. If no previous shared random value is known, then PREVIOUS_SRV is set to 32 NUL (\x00) bytes.

To maintain consistent ordering in HASHED_REVEALS, all the ID_a | R_a pairs are ordered based on the R_a value in ascending order.

Bootstrapping Procedure

As described in [CONS], two shared random values are required for the HSDir overlay periods to work properly as specified in proposal 224. Hence clients MUST NOT use the randomness of this system till it has bootstrapped completely; that is, until two shared random values are included in a consensus. This should happen after three 00:00UTC consensuses have been produced, which takes 48 hours.

Rebooting Directory Authorities [REBOOT]

The shared randomness protocol must be able to support directory authorities who leave or join in the middle of the protocol execution.

An authority that commits in the Commitment Phase and then leaves MUST have stored its reveal value on disk so that it continues participating in the protocol if it returns before or during the Reveal Phase. The reveal value MUST be stored timestamped to avoid sending it on wrong protocol runs.

An authority that misses the Commitment Phase cannot commit anymore, so it's unable to participate in the protocol for that run. Same goes for an authority that misses the Reveal phase. Authorities who do not participate in the protocol SHOULD still carry commits and reveals of others in their vote.

Finally, authorities MUST implement their persistent state in such a way that they will never commit two different values in the same protocol run, even if they have to reboot in the middle (assuming that their persistent state file is kept). A suggested way to structure the persistent state is found at [STATEFORMAT].

Specification [SPEC]

Voting

This section describes how commitments, reveals and SR values are encoded in votes. We describe how to encode both the authority's own commitments/reveals and also the commitments/reveals received from the other authorities. Commitments and reveals share the same line, but reveals are optional.

Participating authorities need to include the line:

"shared-rand-participate"

in their votes to announce that they take part in the protocol.

Computing commitments and reveals [COMMITREVEAL]

A directory authority that wants to participate in this protocol needs to create a new pair of commitment/reveal values for every protocol run. Authorities SHOULD generate a fresh pair of such values right before the first commitment phase of the day (at 00:00UTC).

The value REVEAL is computed as follows:

REVEAL = base64-encode( TIMESTAMP || H(RN) )

where RN is the SHA3 hashed value of a 256-bit random value. We hash the random value to avoid exposing raw bytes from our PRNG to the network (see [RANDOM-REFS]). TIMESTAMP is an 8-bytes network-endian time_t value. Authorities SHOULD set TIMESTAMP to the valid-after time of the vote document they first plan to publish their commit into (so usually at 00:00UTC, except if they start up in a later commit round). The value COMMIT is computed as follows: COMMIT = base64-encode( TIMESTAMP || H(REVEAL) )

Validating commitments and reveals [VALIDATEVALUES]

Given a COMMIT message and a REVEAL message it should be possible to verify that they indeed correspond. To do so, the client extracts the random value H(RN) from the REVEAL message, hashes it, and compares it with the H(H(RN)) from the COMMIT message. We say that the COMMIT and REVEAL messages correspond, if the comparison was successful.

Participants MUST also check that corresponding COMMIT and REVEAL values have the same timestamp value.

Authorities should ignore reveal values during the Reveal Phase that don't correspond to commit values published during the Commitment Phase.

Encoding commit/reveal values in votes [COMMITVOTE]

An authority puts in its vote the commitments and reveals it has produced and seen from the other authorities. To do so, it includes the following in its votes:

"shared-rand-commit" SP VERSION SP ALGNAME SP IDENTITY SP COMMIT [SP REVEAL] NL

where VERSION is the version of the protocol the commit was created with. IDENTITY is the authority's SHA1 identity fingerprint and COMMIT is the encoded commit [COMMITREVEAL]. Authorities during the reveal phase can also optionally include an encoded reveal value REVEAL. There MUST be only one line per authority else the vote is considered invalid. Finally, the ALGNAME is the hash algorithm that should be used to compute COMMIT and REVEAL which is "sha3-256" for version 1.

Shared Random Value [SRVOTE]

Authorities include a shared random value (SRV) in their votes using the following encoding for the previous and current value respectively:

"shared-rand-previous-value" SP NUM_REVEALS SP VALUE NL "shared-rand-current-value" SP NUM_REVEALS SP VALUE NL

where VALUE is the actual shared random value encoded in hex (computed as specified in section [SRCALC]. NUM_REVEALS is the number of reveal values used to generate this SRV.

To maintain consistent ordering, the shared random values of the previous period should be listed before the values of the current period.

Encoding Shared Random Values in the consensus [SRCONSENSUS]

Authorities insert the two active shared random values in the consensus following the same encoding format as in [SRVOTE].

Persistent state format [STATEFORMAT]

As a way to keep ground truth state in this protocol, an authority MUST keep a persistent state of the protocol. The next sub-section suggest a format for this state which is the same as the current state file format.

It contains a preamble, a commitment and reveal section and a list of shared random values.

The preamble (or header) contains the following items. They MUST occur in the order given here:

"Version" SP version NL

[At start, exactly once.]

A document format version. For this specification, version is "1".

"ValidUntil" SP YYYY-MM-DD SP HH:MM:SS NL

[Exactly once]

After this time, this state is expired and shouldn't be used nor trusted. The validity time period is till the end of the current protocol run (the upcoming noon).

The following details the commitment and reveal section. They are encoded the same as in the vote. This makes it easier for implementation purposes.

"Commit" SP version SP algname SP identity SP commit [SP reveal] NL

[Exactly once per authority]

The values are the same as detailed in section [COMMITVOTE].

This line is also used by an authority to store its own value.

Finally is the shared random value section.

"SharedRandPreviousValue" SP num_reveals SP value NL

[At most once]

This is the previous shared random value agreed on at the previous period. The fields are the same as in section [SRVOTE]. "SharedRandCurrentValue" SP num_reveals SP value NL [At most once] This is the latest shared random value. The fields are the same as in section [SRVOTE].

Security Analysis

Security of commit-and-reveal and future directions

The security of commit-and-reveal protocols is well understood, and has certain flaws. Basically, the protocol is insecure to the extent that an adversary who controls b of the authorities gets to choose among 2^b outcomes for the result of the protocol. However, an attacker who is not a dirauth should not be able to influence the outcome at all.

We believe that this system offers sufficient security especially compared to the current situation. More secure solutions require much more advanced crypto and more complex protocols so this seems like an acceptable solution for now.

Here are some examples of possible future directions:

Schemes based on threshold signatures (e.g. see [HOPPER])

Unicorn scheme by Lenstra et al. [UNICORN]

Schemes based on Verifiable Delay Functions [VDFS]

For more alternative approaches on collaborative random number generation also see the discussion at [RNGMESSAGING].

Predicting the shared random value during reveal phase

The reveal phase lasts 12 hours, and most authorities will send their reveal value on the first round of the reveal phase. This means that an attacker can predict the final shared random value about 12 hours before it's generated.

This does not pose a problem for the HSDir hash ring, since we impose an higher uptime restriction on HSDir nodes, so 12 hours predictability is not an issue.

Any other protocols using the shared random value from this system should be aware of this property.

Partition attacks

This design is not immune to certain partition attacks. We believe they don't offer much gain to an attacker as they are very easy to detect and difficult to pull off since an attacker would need to compromise a directory authority at the very least. Also, because of the byzantine general problem, it's very hard (even impossible in some cases) to protect against all such attacks. Nevertheless, this section describes all possible partition attack and how to detect them.

Partition attacks during commit phase

A malicious directory authority could send only its commit to one single authority which results in that authority having an extra commit value for the shared random calculation that the others don't have. Since the consensus needs majority, this won't affect the final SRV value. However, the attacker, using this attack, could remove a single directory authority from the consensus decision at 24:00 when the SRV is computed.

An attacker could also partition the authorities by sending two different commitment values to different authorities during the commit phase.

All of the above is fairly easy to detect. Commitment values in the vote coming from an authority should NEVER be different between authorities. If so, this means an attack is ongoing or very bad bug (highly unlikely).

Partition attacks during reveal phase

Let's consider Alice, a malicious directory authority. Alice could wait until the last reveal round, and reveal its value to half of the authorities. That would partition the authorities into two sets: the ones who think that the shared random value should contain this new reveal, and the rest who don't know about it. This would result in a tie and two different shared random value.

A similar attack is possible. For example, two rounds before the end of the reveal phase, Alice could advertise her reveal value to only half of the dirauths. This way, in the last reveal phase round, half of the dirauths will include that reveal value in their votes and the others will not. In the end of the reveal phase, half of the dirauths will calculate a different shared randomness value than the others.

We claim that this attack is not particularly fruitful: Alice ends up having two shared random values to choose from which is a fundamental problem of commit-and-reveal protocols as well (since the last person can always abort or reveal). The attacker can also sabotage the consensus, but there are other ways this can be done with the current voting system.

Furthermore, we claim that such an attack is very noisy and detectable. First of all, it requires the authority to sabotage two consensuses which will cause quite some noise. Furthermore, the authority needs to send different votes to different auths which is detectable. Like the commit phase attack, the detection here is to make sure that the commitment values in a vote coming from an authority are always the same for each authority.

Discussion

Why the added complexity from proposal 225?

The complexity difference between this proposal and prop225 is in part because prop225 doesn't specify how the shared random value gets to the clients. This proposal spends lots of effort specifying how the two shared random values can always be readily accessible to clients.

Why do you do a commit-and-reveal protocol in 24 rounds?

The reader might be wondering why we span the protocol over the course of a whole day (24 hours), when only 3 rounds would be sufficient to generate a shared random value.

We decided to do it this way, because we piggyback on the Tor voting protocol which also happens every hour.

We could instead only do the shared randomness protocol from 21:00 to 00:00 every day. Or to do it multiple times a day.

However, we decided that since the shared random value needs to be in every consensus anyway, carrying the commitments/reveals as well will not be a big problem. Also, this way we give more chances for a failing dirauth to recover and rejoin the protocol.

Why can't we recover if the 00:00UTC consensus fails?

If the 00:00UTC consensus fails, there will be no shared random value for the whole day. In theory, we could recover by calculating the shared randomness of the day at 01:00UTC instead. However, the engineering issues with adding such recovery logic are too great. For example, it's not easy for an authority who just booted to learn whether a specific consensus failed to be created.

Acknowledgements

Thanks to everyone who has contributed to this design with feedback and discussion.

Thanks go to arma, ioerror, kernelcorn, nickm, s7r, Sebastian, teor, weasel and everyone else!

References:

[RANDOM-REFS]: http://projectbullrun.org/dual-ec/ext-rand.html https://lists.torproject.org/pipermail/tor-dev/2015-November/009954.html [RNGMESSAGING]: https://moderncrypto.org/mail-archive/messaging/2015/002032.html [HOPPER]: https://lists.torproject.org/pipermail/tor-dev/2014-January/006053.html [UNICORN]: https://eprint.iacr.org/2015/366.pdf [VDFS]: https://eprint.iacr.org/2018/601.pdf

Tor Path Specification

Roger Dingledine Nick Mathewson

Note: This is an attempt to specify Tor as currently implemented. Future versions of Tor will implement improved algorithms.

This document tries to cover how Tor chooses to build circuits and assign streams to circuits. Other implementations MAY take other approaches, but implementors should be aware of the anonymity and load-balancing implications of their choices.

THIS SPEC ISN'T DONE YET.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Tables of Contents 1. General operation 1.1. Terminology 1.2. A relay's bandwidth 2. Building circuits 2.1. When we build 2.1.0. We don't build circuits until we have enough directory info 2.1.1. Clients build circuits preemptively 2.1.2. Clients build circuits on demand 2.1.3. Relays build circuits for testing reachability and bandwidth 2.1.4. Hidden-service circuits 2.1.5. Rate limiting of failed circuits 2.1.6. When to tear down circuits 2.2. Path selection and constraints 2.2.1. Choosing an exit 2.2.2. User configuration 2.3. Cannibalizing circuits 2.4. Learning when to give up ("timeout") on circuit construction 2.4.1 Distribution choice and parameter estimation 2.4.2. How much data to record 2.4.3. How to record timeouts 2.4.4. Detecting Changing Network Conditions 2.4.5. Consensus parameters governing behavior 2.4.6. Consensus parameters governing behavior 2.5. Handling failure 3. Attaching streams to circuits 4. Hidden-service related circuits 5. Guard nodes 5.1. How consensus bandwidth weights factor into entry guard selection 6. Server descriptor purposes 7. Detecting route manipulation by Guard nodes (Path Bias) 7.1. Measuring path construction success rates 7.2. Measuring path usage success rates 7.3. Scaling success counts 7.4. Parametrization 7.5. Known barriers to enforcement X. Old notes X.1. Do we actually do this? X.2. A thing we could do to deal with reachability. X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.

General operation

Tor begins building circuits as soon as it has enough directory information to do so (see section 5 of dir-spec.txt). Some circuits are built preemptively because we expect to need them later (for user traffic), and some are built because of immediate need (for user traffic that no current circuit can handle, for testing the network or our reachability, and so on).

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. When bootstrap completes, Tor will be ready to handle an application requesting an exit circuit to services like the World Wide Web.

If the consensus does not contain Exits, Tor will only build internal circuits. In this case, earlier statuses will have included "internal" as indicated above. When bootstrap completes, Tor will be ready to handle an application requesting an internal circuit to hidden services at ".onion" addresses.

If a future consensus contains Exits, exit circuits may become available.]

When a client application creates a new stream (by opening a SOCKS connection or launching a resolve request), we attach it to an appropriate open circuit if one exists, or wait if an appropriate circuit is in-progress. We launch a new circuit only if no current circuit can handle the request. We rotate circuits over time to avoid some profiling attacks.

To build a circuit, we choose all the nodes we want to use, and then construct the circuit. Sometimes, when we want a circuit that ends at a given hop, and we have an appropriate unused circuit, we "cannibalize" the existing circuit and extend it to the new terminus.

These processes are described in more detail below.

This document describes Tor's automatic path selection logic only; path selection can be overridden by a controller (with the EXTENDCIRCUIT and ATTACHSTREAM commands). Paths constructed through these means may violate some constraints given below.

Terminology

A "path" is an ordered sequence of nodes, not yet built as a circuit.

A "clean" circuit is one that has not yet been used for any traffic.

A "fast" or "stable" or "valid" node is one that has the 'Fast' or 'Stable' or 'Valid' flag set respectively, based on our current directory information. A "fast" or "stable" circuit is one consisting only of "fast" or "stable" nodes.

In an "exit" circuit, the final node is chosen based on waiting stream requests if any, and in any case it avoids nodes with exit policy of "reject :". An "internal" circuit, on the other hand, is one where the final node is chosen just like a middle node (ignoring its exit policy).

A "request" is a client-side stream or DNS resolve that needs to be served by a circuit.

A "pending" circuit is one that we have started to build, but which has not yet completed.

A circuit or path "supports" a request if it is okay to use the circuit/path to fulfill the request, according to the rules given below. A circuit or path "might support" a request if some aspect of the request is unknown (usually its target IP), but we believe the path probably supports the request according to the rules given below.

A relay's bandwidth

Old versions of Tor did not report bandwidths in network status documents, so clients had to learn them from the routers' advertised relay descriptors.

For versions of Tor prior to 0.2.1.17-rc, everywhere below where we refer to a relay's "bandwidth", we mean its clipped advertised bandwidth, computed by taking the smaller of the 'rate' and 'observed' arguments to the "bandwidth" element in the relay's descriptor. If a router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH (currently 10 MB/s), we clipped to that value.

For more recent versions of Tor, we take the bandwidth value declared in the consensus, and fall back to the clipped advertised bandwidth only if the consensus does not have bandwidths listed.

Building circuits

When we build

We don't build circuits until we have enough directory info

There's a class of possible attacks where our directory servers only give us information about the relays that they would like us to use. To prevent this attack, we don't build multi-hop circuits for real traffic (like those in 2.1.1, 2.1.2, 2.1.4 below) until we have enough directory information to be reasonably confident this attack isn't being done to us.

Here, "enough" directory information is defined as:

* Having a consensus that's been valid at some point in the last REASONABLY_LIVE_TIME interval (24 hours). * Having enough descriptors that we could build at least some fraction F of all bandwidth-weighted paths, without taking ExitNodes/EntryNodes/etc into account. (F is set by the PathsNeededToBuildCircuits option, defaulting to the 'min_paths_for_circs_pct' consensus parameter, with a final default value of 60%.) * Having enough descriptors that we could build at least some fraction F of all bandwidth-weighted paths, _while_ taking ExitNodes/EntryNodes/etc into account. (F is as above.) * Having a descriptor for every one of the first NUM_USABLE_PRIMARY_GUARDS guards among our primary guards. (see guard-spec.txt)

We define the "fraction of bandwidth-weighted paths" as the product of these three fractions.

* The fraction of descriptors that we have for nodes with the Guard flag, weighted by their bandwidth for the guard position. * The fraction of descriptors that we have for all nodes, weighted by their bandwidth for the middle position. * The fraction of descriptors that we have for nodes with the Exit flag, weighted by their bandwidth for the exit position.

If the consensus has zero weighted bandwidth for a given kind of relay (Guard, Middle, or Exit), Tor instead uses the fraction of relays for which it has the descriptor (not weighted by bandwidth at all).

If the consensus lists zero exit-flagged relays, Tor instead uses the fraction of middle relays.

Clients build circuits preemptively

When running as a client, Tor tries to maintain at least a certain number of clean circuits, so that new streams can be handled quickly. To increase the likelihood of success, Tor tries to predict what circuits will be useful by choosing from among nodes that support the ports we have used in the recent past (by default one hour). Specifically, on startup Tor tries to maintain one clean fast exit circuit that allows connections to port 80, and at least two fast clean stable internal circuits in case we get a resolve request or hidden service request (at least three if we run a hidden service).

After that, Tor will adapt the circuits that it preemptively builds based on the requests it sees from the user: it tries to have two fast clean exit circuits available for every port seen within the past hour (each circuit can be adequate for many predicted ports -- it doesn't need two separate circuits for each port), and it tries to have the above internal circuits available if we've seen resolves or hidden service activity within the past hour. If there are 12 or more clean circuits open, it doesn't open more even if it has more predictions.

Only stable circuits can "cover" a port that is listed in the LongLivedPorts config option. Similarly, hidden service requests to ports listed in LongLivedPorts make us create stable internal circuits.

Note that if there are no requests from the user for an hour, Tor will predict no use and build no preemptive circuits.

The Tor client SHOULD NOT store its list of predicted requests to a persistent medium.

Clients build circuits on demand

Additionally, when a client request exists that no circuit (built or pending) might support, we create a new circuit to support the request. For exit connections, we pick an exit node that will handle the most pending requests (choosing arbitrarily among ties), launch a circuit to end there, and repeat until every unattached request might be supported by a pending or built circuit. For internal circuits, we pick an arbitrary acceptable path, repeating as needed.

Clients consider a circuit to become "dirty" as soon as a stream is attached to it, or some other request is performed over the circuit. If a circuit has been "dirty" for at least MaxCircuitDirtiness seconds, new circuits may not be attached to it.

In some cases we can reuse an already established circuit if it's clean; see Section 2.3 (cannibalizing circuits) for details.

Relays build circuits for testing reachability and bandwidth

Tor relays test reachability of their ORPort once they have successfully built a circuit (on startup and whenever their IP address changes). They build an ordinary fast internal circuit with themselves as the last hop. As soon as any testing circuit succeeds, the Tor relay decides it's reachable and is willing to publish a descriptor.

We launch multiple testing circuits (one at a time), until we have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we do a "bandwidth test" by sending a certain number of relay drop cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE total cells divided across the four circuits, but never more than CIRCWINDOW_START (1000) cells total. This exercises both outgoing and incoming bandwidth, and helps to jumpstart the observed bandwidth (see dir-spec.txt).

Tor relays also test reachability of their DirPort once they have established a circuit, but they use an ordinary exit circuit for this purpose.

Hidden-service circuits

See section 4 below.

Rate limiting of failed circuits

If we fail to build a circuit N times in a X second period (see Section 2.3 for how this works), we stop building circuits until the X seconds have elapsed. XXXX

When to tear down circuits

Clients should tear down circuits (in general) only when those circuits have no streams on them. Additionally, clients should tear-down stream-less circuits only under one of the following conditions:

- The circuit has never had a stream attached, and it was created too long in the past (based on CircuitsAvailableTimeout or cbtlearntimeout, depending on timeout estimate status). - The circuit is dirty (has had a stream attached), and it has been dirty for at least MaxCircuitDirtiness.

Path selection and constraints

We choose the path for each new circuit before we build it. We choose the exit node first, followed by the other nodes in the circuit, front to back. (In other words, for a 3-hop circuit, we first pick hop 3, then hop 1, then hop 2.) All paths we generate obey the following constraints:

- We do not choose the same router twice for the same path. - We do not choose any router in the same family as another in the same path. (Two routers are in the same family if each one lists the other in the "family" entries of its descriptor.) - We do not choose more than one router in a given /16 subnet (unless EnforceDistinctSubnets is 0). - We don't choose any non-running or non-valid router unless we have been configured to do so. By default, we are configured to allow non-valid routers in "middle" and "rendezvous" positions. - If we're using Guard nodes, the first node must be a Guard (see 5 below) - XXXX Choosing the length

For "fast" circuits, we only choose nodes with the Fast flag. For non-"fast" circuits, all nodes are eligible.

For all circuits, we weight node selection according to router bandwidth.

We also weight the bandwidth of Exit and Guard flagged nodes depending on the fraction of total bandwidth that they make up and depending upon the position they are being selected for.

These weights are published in the consensus, and are computed as described in Section "Computing Bandwidth Weights" of dir-spec.txt. They are:

Wgg - Weight for Guard-flagged nodes in the guard position Wgm - Weight for non-flagged nodes in the guard Position Wgd - Weight for Guard+Exit-flagged nodes in the guard Position Wmg - Weight for Guard-flagged nodes in the middle Position Wmm - Weight for non-flagged nodes in the middle Position Wme - Weight for Exit-flagged nodes in the middle Position Wmd - Weight for Guard+Exit flagged nodes in the middle Position Weg - Weight for Guard flagged nodes in the exit Position Wem - Weight for non-flagged nodes in the exit Position Wee - Weight for Exit-flagged nodes in the exit Position Wed - Weight for Guard+Exit-flagged nodes in the exit Position Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes Wbg - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests Wbm - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests Wbe - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests

If any of those weights is malformed or not present in a consensus, clients proceed with the regular path selection algorithm setting the weights to the default value of 10000.

Additionally, we may be building circuits with one or more requests in mind. Each kind of request puts certain constraints on paths:

- All service-side introduction circuits and all rendezvous paths should be Stable. - All connection requests for connections that we think will need to stay open a long time require Stable circuits. Currently, Tor decides this by examining the request's target port, and comparing it to a list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050, 5190, 5222, 5223, 6667, 6697, 8300.) - DNS resolves require an exit node whose exit policy is not equivalent to "reject *:*". - Reverse DNS resolves require a version of Tor with advertised eventdns support (available in Tor 0.1.2.1-alpha-dev and later). - All connection requests require an exit node whose exit policy supports their target address and port (if known), or which "might support it" (if the address isn't known). See 2.2.1. - Rules for Fast? XXXXX

Choosing an exit

If we know what IP address we want to connect to or resolve, we can trivially tell whether a given router will support it by simulating its declared exit policy.

Because we often connect to addresses of the form hostname:port, we do not always know the target IP address when we select an exit node. In these cases, we need to pick an exit node that "might support" connections to a given address port with an unknown address. An exit node "might support" such a connection if any clause that accepts any connections to that port precedes all clauses (if any) that reject all connections to that port.

Unless requested to do so by the user, we never choose an exit node flagged as "BadExit" by more than half of the authorities who advertise themselves as listing bad exits.

User configuration

Users can alter the default behavior for path selection with configuration options.

- If "ExitNodes" is provided, then every request requires an exit node on the ExitNodes list. (If a request is supported by no nodes on that list, and StrictExitNodes is false, then Tor treats that request as if ExitNodes were not provided.) - "EntryNodes" and "StrictEntryNodes" behave analogously. - If a user tries to connect to or resolve a hostname of the form <target>.<servername>.exit, the request is rewritten to a request for <target>, and the request is only supported by the exit whose nickname or fingerprint is <servername>. - When set, "HSLayer2Nodes" and "HSLayer3Nodes" relax Tor's path restrictions to allow nodes in the same /16 and node family to reappear in the path. They also allow the guard node to be chosen as the RP, IP, and HSDIR, and as the hop before those positions.

Cannibalizing circuits

If we need a circuit and have a clean one already established, in some cases we can adapt the clean circuit for our new purpose. Specifically,

For hidden service interactions, we can "cannibalize" a clean internal circuit if one is available, so we don't need to build those circuits from scratch on demand.

We can also cannibalize clean circuits when the client asks to exit at a given node -- either via the ".exit" notation or because the destination is running at the same location as an exit node.

Learning when to give up ("timeout") on circuit construction

Since version 0.2.2.8-alpha, Tor clients attempt to learn when to give up on circuits based on network conditions.

Distribution choice

Based on studies of build times, we found that the distribution of circuit build times appears to be a Frechet distribution (and a multi-modal Frechet distribution, if more than one guard or bridge is used). However, estimators and quantile functions of the Frechet distribution are difficult to work with and slow to converge. So instead, since we are only interested in the accuracy of the tail, clients approximate the tail of the multi-modal distribution with a single Pareto curve.

How much data to record

From our observations, the minimum number of circuit build times for a reasonable fit appears to be on the order of 100. However, to keep a good fit over the long term, clients store 1000 most recent circuit build times in a circular array.

These build times only include the times required to build three-hop circuits, and the times required to build the first three hops of circuits with more than three hops. Circuits of fewer than three hops are not recorded, and hops past the third are not recorded.

The Tor client should build test circuits at a rate of one every 'cbttestfreq' (10 seconds) until 'cbtmincircs' (100 circuits) are built, with a maximum of 'cbtmaxopencircs' (default: 10) circuits open at once. This allows a fresh Tor to have a CircuitBuildTimeout estimated within 30 minutes after install or network change (see section 2.4.5 below).

Timeouts are stored on disk in a histogram of 10ms bin width, the same width used to calculate the Xm value above. The timeouts recorded in the histogram must be shuffled after being read from disk, to preserve a proper expiration of old values after restart.

Thus, some build time resolution is lost during restart. Implementations may choose a different persistence mechanism than this histogram, but be aware that build time binning is still needed for parameter estimation.

Parameter estimation

Once 'cbtmincircs' build times are recorded, Tor clients update the distribution parameters and recompute the timeout every circuit completion (though see section 2.4.5 for when to pause and reset timeout due to too many circuits timing out).

Tor clients calculate the parameters for a Pareto distribution fitting the data using the maximum likelihood estimator. For derivation, see: https://en.wikipedia.org/wiki/Pareto_distribution#Estimation_of_parameters

Because build times are not a true Pareto distribution, we alter how Xm is computed. In a max likelihood estimator, the mode of the distribution is used directly as Xm.

Instead of using the mode of discrete build times directly, Tor clients compute the Xm parameter using the weighted average of the midpoints of the 'cbtnummodes' (10) most frequently occurring 10ms histogram bins. Ties are broken in favor of earlier bins (that is, in favor of bins corresponding to shorter build times).

(The use of 10 modes was found to minimize error from the selected cbtquantile, with 10ms bins for quantiles 60-80, compared to many other heuristics).

To avoid ln(1.0+epsilon) precision issues, use log laws to rewrite the estimator for 'alpha' as the sum of logs followed by subtraction, rather than multiplication and division:

alpha = n/(Sum_n{ln(MAX(Xm, x_i))} - n*ln(Xm))

In this, n is the total number of build times that have completed, x_i is the ith recorded build time, and Xm is the modes of x_i as above.

All times below Xm are counted as having the Xm value via the MAX(), because in Pareto estimators, Xm is supposed to be the lowest value. However, since clients use mode averaging to estimate Xm, there can be values below our Xm. Effectively, the Pareto estimator then treats that everything smaller than Xm happened at Xm. One can also see that if clients did not do this, alpha could underflow to become negative, which results in an exponential curve, not a Pareto probability distribution.

The timeout itself is calculated by using the Pareto Quantile function (the inverted CDF) to give us the value on the CDF such that 80% of the mass of the distribution is below the timeout value (parameter 'cbtquantile').

The Pareto Quantile Function (inverse CDF) is:

F(q) = Xm/((1.0-q)^(1.0/alpha))

Thus, clients obtain the circuit build timeout for 3-hop circuits by computing:

timeout_ms = F(0.8) # 'cbtquantile' == 0.8

With this, we expect that the Tor client will accept the fastest 80% of the total number of paths on the network.

Clients obtain the circuit close time to completely abandon circuits as:

close_ms = F(0.99) # 'cbtclosequantile' == 0.99

To avoid waiting an unreasonably long period of time for circuits that simply have relays that are down, Tor clients cap timeout_ms at the max build time actually observed so far, and cap close_ms at twice this max, but at least 60 seconds:

timeout_ms = MIN(timeout_ms, max_observed_timeout) close_ms = MAX(MIN(close_ms, 2*max_observed_timeout), 'cbtinitialtimeout')

Calculating timeouts thresholds for circuits of different lengths

The timeout_ms and close_ms estimates above are good only for 3-hop circuits, since only 3-hop circuits are recorded in the list of build times.

To calculate the appropriate timeouts and close timeouts for circuits of other lengths, the client multiples the timeout_ms and close_ms values by a scaling factor determined by the number of communication hops needed to build their circuits:

timeout_ms[hops=n] = timeout_ms * Actions(N) / Actions(3)

close_ms[hops=n] = close_ms * Actions(N) / Actions(3)

where Actions(N) = N * (N + 1) / 2.

To calculate timeouts for operations other than circuit building, the client should add X to Actions(N) for every round-trip communication required with the Xth hop.

How to record timeouts

Pareto estimators begin to lose their accuracy if the tail is omitted. Hence, Tor clients actually calculate two timeouts: a usage timeout, and a close timeout.

Circuits that pass the usage timeout are marked as measurement circuits, and are allowed to continue to build until the close timeout corresponding to the point 'cbtclosequantile' (default 99) on the Pareto curve, or 60 seconds, whichever is greater.

The actual completion times for these measurement circuits should be recorded.

Implementations should completely abandon a circuit and ignore the circuit if the total build time exceeds the close threshold. Such closed circuits should be ignored, as this typically means one of the relays in the path is offline.

Detecting Changing Network Conditions

Tor clients attempt to detect both network connectivity loss and drastic changes in the timeout characteristics.

To detect changing network conditions, clients keep a history of the timeout or non-timeout status of the past 'cbtrecentcount' circuits (20 circuits) that successfully completed at least one hop. If more than 90% of these circuits timeout, the client discards all buildtimes history, resets the timeout to 'cbtinitialtimeout' (60 seconds), and then begins recomputing the timeout.

If the timeout was already at least cbtinitialtimeout, the client doubles the timeout.

The records here (of how many circuits succeeded or failed among the most recent 'cbrrecentcount') are not stored as persistent state. On reload, we start with a new, empty state.

Consensus parameters governing behavior

Clients that implement circuit build timeout learning should obey the following consensus parameters that govern behavior, in order to allow us to handle bugs or other emergent behaviors due to client circuit construction. If these parameters are not present in the consensus, the listed default values should be used instead.

cbtdisabled Default: 0 Min: 0 Max: 1 Effect: If 1, all CircuitBuildTime learning code should be disabled and history should be discarded. For use in emergency situations only. cbtnummodes Default: 10 Min: 1 Max: 20 Effect: This value governs how many modes to use in the weighted average calculation of Pareto parameter Xm. Selecting Xm as the average of multiple modes improves accuracy of the Pareto tail for quantile cutoffs from 60-80% (see cbtquantile). cbtrecentcount Default: 20 Min: 3 Max: 1000 Effect: This is the number of circuit build outcomes (success vs timeout) to keep track of for the following option. cbtmaxtimeouts Default: 18 Min: 3 Max: 10000 Effect: When this many timeouts happen in the last 'cbtrecentcount' circuit attempts, the client should discard all of its history and begin learning a fresh timeout value. Note that if this parameter's value is greater than the value of 'cbtrecentcount', then the history will never be discarded because of this feature. cbtmincircs Default: 100 Min: 1 Max: 10000 Effect: This is the minimum number of circuits to build before computing a timeout. Note that if this parameter's value is higher than 1000 (the number of time observations that a client keeps in its circular buffer), circuit build timeout calculation is effectively disabled, and the default timeouts are used indefinitely. cbtquantile Default: 80 Min: 10 Max: 99 Effect: This is the position on the quantile curve to use to set the timeout value. It is a percent (10-99). cbtclosequantile Default: 99 Min: Value of cbtquantile parameter Max: 99 Effect: This is the position on the quantile curve to use to set the timeout value to use to actually close circuits. It is a percent (0-99). cbttestfreq Default: 10 Min: 1 Max: 2147483647 (INT32_MAX) Effect: Describes how often in seconds to build a test circuit to gather timeout values. Only applies if less than 'cbtmincircs' have been recorded. cbtmintimeout Default: 10 Min: 10 Max: 2147483647 (INT32_MAX) Effect: This is the minimum allowed timeout value in milliseconds. cbtinitialtimeout Default: 60000 Min: Value of cbtmintimeout Max: 2147483647 (INT32_MAX) Effect: This is the timeout value to use before we have enough data to compute a timeout, in milliseconds. If we do not have enough data to compute a timeout estimate (see cbtmincircs), then we use this interval both for the close timeout and the abandon timeout. cbtlearntimeout Default: 180 Min: 10 Max: 60000 Effect: This is how long idle circuits will be kept open while cbt is learning a new timeout value. cbtmaxopencircs Default: 10 Min: 0 Max: 14 Effect: This is the maximum number of circuits that can be open at at the same time during the circuit build time learning phase.

Handling failure

If an attempt to extend a circuit fails (either because the first create failed or a subsequent extend failed) then the circuit is torn down and is no longer pending. (XXXX really?) Requests that might have been supported by the pending circuit thus become unsupported, and a new circuit needs to be constructed.

If a stream "begin" attempt fails with an EXITPOLICY error, we decide that the exit node's exit policy is not correctly advertised, so we treat the exit node as if it were a non-exit until we retrieve a fresh descriptor for it.

Excessive amounts of either type of failure can indicate an attack on anonymity. See section 7 for how excessive failure is handled.

Attaching streams to circuits

When a circuit that might support a request is built, Tor tries to attach the request's stream to the circuit and sends a BEGIN, BEGIN_DIR, or RESOLVE relay cell as appropriate. If the request completes unsuccessfully, Tor considers the reason given in the CLOSE relay cell. [XXX yes, and?]

After a request has remained unattached for SocksTimeout (2 minutes by default), Tor abandons the attempt and signals an error to the client as appropriate (e.g., by closing the SOCKS connection).

XXX Timeouts and when Tor auto-retries.

What stream-end-reasons are appropriate for retrying.

If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.

Hidden-service related circuits

XXX Tracking expected hidden service use (client-side and hidserv-side)

Guard nodes

We use Guard nodes (also called "helper nodes" in the research literature) to prevent certain profiling attacks. For an overview of our Guard selection algorithm -- which has grown rather complex -- see guard-spec.txt.

How consensus bandwidth weights factor into entry guard selection

When weighting a list of routers for choosing an entry guard, the following consensus parameters (from the "bandwidth-weights" line) apply:

Wgg - Weight for Guard-flagged nodes in the guard position Wgm - Weight for non-flagged nodes in the guard Position Wgd - Weight for Guard+Exit-flagged nodes in the guard Position Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes

Please see "bandwidth-weights" in §3.4.1 of dir-spec.txt for more in depth descriptions of these parameters.

If a router has been marked as both an entry guard and an exit, then we prefer to use it more, with our preference for doing so (roughly) linearly increasing w.r.t. the router's non-guard bandwidth and bandwidth weight (calculated without taking the guard flag into account). From proposal #236:

Let Wpf denote the weight from the 'bandwidth-weights' line a

client would apply to N for position p if it had the guard

flag, Wpn the weight if it did not have the guard flag, and B the

measured bandwidth of N in the consensus. Then instead of choosing

N for position p proportionally to WpfB or WpnB, clients should

choose N proportionally to FWpfB + (1-F)WpnB.

where F is the weight as calculated using the above parameters.

Server descriptor purposes

There are currently three "purposes" supported for server descriptors: general, controller, and bridge. Most descriptors are of type general -- these are the ones listed in the consensus, and the ones fetched and used in normal cases.

Controller-purpose descriptors are those delivered by the controller and labelled as such: they will be kept around (and expire like normal descriptors), and they can be used by the controller in its CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it chooses paths.

Bridge-purpose descriptors are for routers that are used as bridges. See doc/design-paper/blocking.pdf for more design explanation, or proposal 125 for specific details. Currently bridge descriptors are used in place of normal entry guards, for Tor clients that have UseBridges enabled.

Detecting route manipulation by Guard nodes (Path Bias)

The Path Bias defense is designed to defend against a type of route capture where malicious Guard nodes deliberately fail or choke circuits that extend to non-colluding Exit nodes to maximize their network utilization in favor of carrying only compromised traffic.

In the extreme, the attack allows an adversary that carries c/n of the network capacity to deanonymize c/n of the network connections, breaking the O((c/n)^2) property of Tor's original threat model. It also allows targeted attacks aimed at monitoring the activity of specific users, bridges, or Guard nodes.

There are two points where path selection can be manipulated: during construction, and during usage. Circuit construction can be manipulated by inducing circuit failures during circuit extend steps, which causes the Tor client to transparently retry the circuit construction with a new path. Circuit usage can be manipulated by abusing the stream retry features of Tor (for example by withholding stream attempt responses from the client until the stream timeout has expired), at which point the tor client will also transparently retry the stream on a new path.

The defense as deployed therefore makes two independent sets of measurements of successful path use: one during circuit construction, and one during circuit usage.

The intended behavior is for clients to ultimately disable the use of Guards responsible for excessive circuit failure of either type (see section 7.4); however known issues with the Tor network currently restrict the defense to being informational only at this stage (see section 7.5).

Measuring path construction success rates

Clients maintain two counts for each of their guards: a count of the number of times a circuit was extended to at least two hops through that guard, and a count of the number of circuits that successfully complete through that guard. The ratio of these two numbers is used to determine a circuit success rate for that Guard.

Circuit build timeouts are counted as construction failures if the circuit fails to complete before the 95% "right-censored" timeout interval, not the 80% timeout condition (see section 2.4).

If a circuit closes prematurely after construction but before being requested to close by the client, this is counted as a failure.

Measuring path usage success rates

Clients maintain two usage counts for each of their guards: a count of the number of usage attempts, and a count of the number of successful usages.

A usage attempt means any attempt to attach a stream to a circuit.

Usage success status is temporarily recorded by state flags on circuits. Guard usage success counts are not incremented until circuit close. A circuit is marked as successfully used if we receive a properly recognized RELAY cell on that circuit that was expected for the current circuit purpose.

If subsequent stream attachments fail or time out, the successfully used state of the circuit is cleared, causing it once again to be regarded as a usage attempt only.

Upon close by the client, all circuits that are still marked as usage attempts are probed using a RELAY_BEGIN cell constructed with a destination of the form 0.a.b.c:25, where a.b.c is a 24 bit random nonce. If we get a RELAY_COMMAND_END in response matching our nonce, the circuit is counted as successfully used.

If any unrecognized RELAY cells arrive after the probe has been sent, the circuit is counted as a usage failure.

If the stream failure reason codes DESTROY, TORPROTOCOL, or INTERNAL are received in response to any stream attempt, such circuits are not probed and are declared usage failures.

Prematurely closed circuits are not probed, and are counted as usage failures.

Scaling success counts

To provide a moving average of recent Guard activity while still preserving the ability to verify correctness, we periodically "scale" the success counts by multiplying them by a scale factor between 0 and 1.0.

Scaling is performed when either usage or construction attempt counts exceed a parametrized value.

To avoid error due to scaling during circuit construction and use, currently open circuits are subtracted from the usage counts before scaling, and added back after scaling.

Parametrization

The following consensus parameters tune various aspects of the defense.

pb_mincircs Default: 150 Min: 5 Effect: This is the minimum number of circuits that must complete at least 2 hops before we begin evaluating construction rates. pb_noticepct Default: 70 Min: 0 Max: 100 Effect: If the circuit success rate falls below this percentage, we emit a notice log message. pb_warnpct Default: 50 Min: 0 Max: 100 Effect: If the circuit success rate falls below this percentage, we emit a warn log message. pb_extremepct Default: 30 Min: 0 Max: 100 Effect: If the circuit success rate falls below this percentage, we emit a more alarmist warning log message. If pb_dropguard is set to 1, we also disable the use of the guard. pb_dropguards Default: 0 Min: 0 Max: 1 Effect: If the circuit success rate falls below pb_extremepct, when pb_dropguard is set to 1, we disable use of that guard. pb_scalecircs Default: 300 Min: 10 Effect: After this many circuits have completed at least two hops, Tor performs the scaling described in Section 7.3. pb_multfactor and pb_scalefactor Default: 1/2 Min: 0.0 Max: 1.0 Effect: The double-precision result obtained from pb_multfactor/pb_scalefactor is multiplied by our current counts to scale them. pb_minuse Default: 20 Min: 3 Effect: This is the minimum number of circuits that we must attempt to use before we begin evaluating construction rates. pb_noticeusepct Default: 80 Min: 3 Effect: If the circuit usage success rate falls below this percentage, we emit a notice log message. pb_extremeusepct Default: 60 Min: 3 Effect: If the circuit usage success rate falls below this percentage, we emit a warning log message. We also disable the use of the guard if pb_dropguards is set. pb_scaleuse Default: 100 Min: 10 Effect: After we have attempted to use this many circuits, Tor performs the scaling described in Section 7.3.

Known barriers to enforcement

Due to intermittent CPU overload at relays, the normal rate of successful circuit completion is highly variable. The Guard-dropping version of the defense is unlikely to be deployed until the ntor circuit handshake is enabled, or the nature of CPU overload induced failure is better understood.

Old notes

Do we actually do this?

How to deal with network down. - While all helpers are down/unreachable and there are no established or on-the-way testing circuits, launch a testing circuit. (Do this periodically in the same way we try to establish normal circuits when things are working normally.) (Testing circuits are a special type of circuit, that streams won't attach to by accident.) - When a testing circuit succeeds, mark all helpers up and hold the testing circuit open. - If a connection to a helper succeeds, close all testing circuits. Else mark that helper down and try another. - If the last helper is marked down and we already have a testing circuit established, then add the first hop of that testing circuit to the end of our helper node list, close that testing circuit, and go back to square one. (Actually, rather than closing the testing circuit, can we get away with converting it to a normal circuit and beginning to use it immediately?)

[Do we actually do any of the above? If so, let's spec it. If not, let's remove it. -NM]

A thing we could do to deal with reachability.

And as a bonus, it leads to an answer to Nick's attack ("If I pick my helper nodes all on 18.0.0.0:*, then I move, you'll know where I bootstrapped") -- the answer is to pick your original three helper nodes without regard for reachability. Then the above algorithm will add some more that are reachable for you, and if you move somewhere, it's more likely (though not certain) that some of the originals will become useful. Is that smart or just complex?

Some stuff that worries me about entry guards. 2006 Jun, Nickm.

It is unlikely for two users to have the same set of entry guards. Observing a user is sufficient to learn its entry guards. So, as we move around, entry guards make us linkable. If we want to change guards when our location (IP? subnet?) changes, we have two bad options. We could

- Drop the old guards. But if we go back to our old location, we'll not use our old guards. For a laptop that sometimes gets used from work and sometimes from home, this is pretty fatal. - Remember the old guards as associated with the old location, and use them again if we ever go back to the old location. This would be nasty, since it would force us to record where we've been.

[Do we do any of this now? If not, this should move into 099-misc or 098-todo. -NM]

Tor Guard Specification

Isis Lovecruft George Kadianakis Ola Bini Nick Mathewson Table of Contents 1. Introduction and motivation 2. State instances 3. Circuit Creation, Entry Guard Selection (1000 foot view) 3.1 Path selection 3.1.1 Managing entry guards 3.1.2 Middle and exit node selection 3.2 Circuit Building 4. The algorithm. 4.0. The guards listed in the current consensus. [Section:GUARDS] 4.1. The Sampled Guard Set. [Section:SAMPLED] 4.2. The Usable Sample [Section:FILTERED] 4.3. The confirmed-guard list. [Section:CONFIRMED] 4.4. The Primary guards [Section:PRIMARY] 4.5. Retrying guards. [Section:RETRYING] 4.6. Selecting guards for circuits. [Section:SELECTING] 4.7. When a circuit fails. [Section:ON_FAIL] 4.8. When a circuit succeeds [Section:ON_SUCCESS] 4.9. Updating the list of waiting circuits [Section:UPDATE_WAITING] 4.10. Whenever we get a new consensus. [Section:ON_CONSENSUS] 4.11. Deciding whether to generate a new circuit. 4.12. When we are missing descriptors. A. Appendices A.0. Acknowledgements A.1. Parameters with suggested values. [Section:PARAM_VALS] A.2. Random values [Section:RANDOM] A.3. Why not a sliding scale of primaryness? [Section:CVP] A.4. Controller changes A.5. Persistent state format

Introduction and motivation

Tor uses entry guards to prevent an attacker who controls some fraction of the network from observing a fraction of every user's traffic. If users chose their entries and exits uniformly at random from the list of servers every time they build a circuit, then an adversary who had (k/N) of the network would deanonymize F=(k/N)^2 of all circuits... and after a given user had built C circuits, the attacker would see them at least once with probability 1-(1-F)^C. With large C, the attacker would get a sample of every user's traffic with probability 1.

To prevent this from happening, Tor clients choose a small number of guard nodes (e.g. 3). These guard nodes are the only nodes that the client will connect to directly. If they are not compromised, the user's paths are not compromised.

This specification outlines Tor's guard housekeeping algorithm, which tries to meet the following goals:

- Heuristics and algorithms for determining how and which guards are chosen should be kept as simple and easy to understand as possible. - Clients in censored regions or who are behind a fascist firewall who connect to the Tor network should not experience any significant disadvantage in terms of reachability or usability. - Tor should make a best attempt at discovering the most appropriate behavior, with as little user input and configuration as possible. - Tor clients should discover usable guards without too much delay. - Tor clients should resist (to the extent possible) attacks that try to force them onto compromised guards. - Should maintain the load-balancing offered by the path selection algorithm

State instances

In the algorithm below, we describe a set of persistent and non-persistent state variables. These variables should be treated as an object, of which multiple instances can exist.

In particular, we specify the use of three particular instances:

A. UseBridges

If UseBridges is set, then we replace the {GUARDS} set in [Sec:GUARDS] below with the list of configured bridges. We maintain a separate persistent instance of {SAMPLED_GUARDS} and {CONFIRMED_GUARDS} and other derived values for the UseBridges case. In this case, we impose no upper limit on the sample size. B. EntryNodes / ExcludeNodes / Reachable*Addresses / FascistFirewall / ClientUseIPv4=0 If one of the above options is set, and UseBridges is not, then we compare the fraction of usable guards in the consensus to the total number of guards in the consensus. If this fraction is less than {MEANINGFUL_RESTRICTION_FRAC}, we use a separate instance of the state. (While Tor is running, we do not change back and forth between the separate instance of the state and the default instance unless the fraction of usable guards is 5% higher than, or 5% lower than, {MEANINGFUL_RESTRICTION_FRAC}. This prevents us from flapping back and forth between instances if we happen to hit {MEANINGFUL_RESTRICTION_FRAC} exactly. If this fraction is less than {EXTREME_RESTRICTION_FRAC}, we use a separate instance of the state, and warn the user. [TODO: should we have a different instance for each set of heavily restricted options?] C. Default If neither of the above variant-state instances is used, we use a default instance.

Circuit Creation, Entry Guard Selection (1000 foot view)

A circuit in Tor is a path through the network connecting a client to its destination. At a high-level, a three-hop exit circuit will look like this:

Client <-> Entry Guard <-> Middle Node <-> Exit Node <-> Destination

Entry guards are the only nodes which a client will connect to directly. Exit relays are the nodes by which traffic exits the Tor network in order to connect to an external destination.

3.1 Path selection

For any multi-hop circuit, at least one entry guard and middle node(s) are required. An exit node is required if traffic will exit the Tor network. Depending on its configuration, a relay listed in a consensus could be used for any of these roles. However, this specification defines how entry guards specifically should be selected and managed, as opposed to middle or exit nodes.

3.1.1 Managing entry guards

At a high level, a relay listed in a consensus will move through the following states in the process from initial selection to eventual usage as an entry guard:

relays listed in consensus | sampled | | confirmed filtered | | | primary usable_filtered

Relays listed in the latest consensus can be sampled for guard usage if they have the "Guard" flag. Sampling is random but weighted by a measured bandwidth multiplied by bandwidth-weights (Wgg if guard only, Wgd if guard+exit flagged).

Once a path is built and a circuit established using this guard, it is marked as confirmed. Until this point, guards are first sampled and then filtered based on information such as our current configuration (see SAMPLED and FILTERED sections) and later marked as usable_filtered if the guard is not primary but can be reached.

It is always preferable to use a primary guard when building a new circuit in order to reduce guard churn; only on failure to connect to existing primary guards will new guards be used.

3.1.2 Middle and exit node selection

Middle nodes are selected at random from relays listed in the latest consensus, weighted by bandwidth and bandwidth-weights. Exit nodes are chosen similarly but restricted to relays with a sufficiently permissive exit policy.

3.2 Circuit Building

Once a path is chosen, Tor will use this path to build a new circuit.

If the circuit is built successfully, Tor will either use it immediately, or Tor will wait for a circuit with a more preferred guard if there's a good chance that it will be able to make one.

If the circuit fails in a way that makes us conclude that a guard is not reachable, the guard is marked as unreachable, the circuit is closed, and waiting circuits are updated.

The algorithm.

The guards listed in the current consensus. [Section:GUARDS]

By {set:GUARDS} we mean the set of all guards in the current consensus that are usable for all circuits and directory requests. (They must have the flags: Stable, Fast, V2Dir, Guard.)

Rationale

We require all guards to have the flags that we potentially need from any guard, so that all guards are usable for all circuits.

The Sampled Guard Set. [Section:SAMPLED]

We maintain a set, {set:SAMPLED_GUARDS}, that persists across invocations of Tor. It is a subset of the nodes ordered by a sample idx that we have seen listed as a guard in the consensus at some point. For each such guard, we record persistently:

- {pvar:ADDED_ON_DATE}: The date on which it was added to sampled_guards. We set this value to a point in the past, using RAND(now, {GUARD_LIFETIME}/10). See Appendix [RANDOM] below. - {pvar:ADDED_BY_VERSION}: The version of Tor that added it to sampled_guards. - {pvar:IS_LISTED}: Whether it was listed as a usable Guard in the _most recent_ consensus we have seen. - {pvar:FIRST_UNLISTED_AT}: If IS_LISTED is false, the publication date of the earliest consensus in which this guard was listed such that we have not seen it listed in any later consensus. Otherwise "None." We randomize this to a point in the past, based on RAND(added_at_time, {REMOVE_UNLISTED_GUARDS_AFTER} / 5)

For each guard in {SAMPLED_GUARDS}, we also record this data, non-persistently:

- {tvar:last_tried_connect}: A 'last tried to connect at' time. Default 'never'. - {tvar:is_reachable}: an "is reachable" tristate, with possible values { <state:yes>, <state:no>, <state:maybe> }. Default '<maybe>.' [Note: "yes" is not strictly necessary, but I'm making it distinct from "maybe" anyway, to make our logic clearer. A guard is "maybe" reachable if it's worth trying. A guard is "yes" reachable if we tried it and succeeded.] - {tvar:failing_since}: The first time when we failed to connect to this guard. Defaults to "never". Reset to "never" when we successfully connect to this guard. - {tvar:is_pending} A "pending" flag. This indicates that we are trying to build an exploratory circuit through the guard, and we don't know whether it will succeed. - {tvar:pending_since}: A timestamp. Set whenever we set {tvar:is_pending} to true; cleared whenever we set {tvar:is_pending} to false. NOTE

We require that {SAMPLED_GUARDS} contain at least {MIN_FILTERED_SAMPLE} guards from the consensus (if possible), but not more than {MAX_SAMPLE_THRESHOLD} of the number of guards in the consensus, and not more than {MAX_SAMPLE_SIZE} in total. (But if the maximum would be smaller than {MIN_FILTERED_SAMPLE}, we set the maximum at {MIN_FILTERED_SAMPLE}.)

To add a new guard to {SAMPLED_GUARDS}, pick an entry at random from ({GUARDS} - {SAMPLED_GUARDS}), according to the path selection rules.

We remove an entry from {SAMPLED_GUARDS} if:

* We have a live consensus, and {IS_LISTED} is false, and {FIRST_UNLISTED_AT} is over {REMOVE_UNLISTED_GUARDS_AFTER} days in the past. OR * We have a live consensus, and {ADDED_ON_DATE} is over {GUARD_LIFETIME} ago, *and* {CONFIRMED_ON_DATE} is either "never", or over {GUARD_CONFIRMED_MIN_LIFETIME} ago.

Note that {SAMPLED_GUARDS} does not depend on our configuration. It is possible that we can't actually connect to any of these guards.

Rationale

The {SAMPLED_GUARDS} set is meant to limit the total number of guards that a client will connect to in a given period. The upper limit on its size prevents us from considering too many guards.

The first expiration mechanism is there so that our {SAMPLED_GUARDS} list does not accumulate so many dead guards that we cannot add new ones.

The second expiration mechanism makes us rotate our guards slowly over time.

Ordering the {SAMPLED_GUARDS} set in the order in which we sampled those guards and picking guards from that set according to this ordering improves load-balancing. It is closer to offer the expected usage of the guard nodes as per the path selection rules.

The ordering also improves on another objective of this proposal: trying to resist an adversary pushing clients over compromised guards, since the adversary would need the clients to exhaust all their initial {SAMPLED_GUARDS} set before having a chance to use a newly deployed adversary node.

The Usable Sample [Section:FILTERED]

We maintain another set, {set:FILTERED_GUARDS}, that does not persist. It is derived from:

- {SAMPLED_GUARDS} - our current configuration, - the path bias information.

A guard is a member of {set:FILTERED_GUARDS} if and only if all of the following are true:

- It is a member of {SAMPLED_GUARDS}, with {IS_LISTED} set to true. - It is not disabled because of path bias issues. - It is not disabled because of ReachableAddresses policy, the ClientUseIPv4 setting, the ClientUseIPv6 setting, the FascistFirewall setting, or some other option that prevents using some addresses. - It is not disabled because of ExcludeNodes. - It is a bridge if UseBridges is true; or it is not a bridge if UseBridges is false. - Is included in EntryNodes if EntryNodes is set and UseBridges is not. (But see 2.B above).

We have an additional subset, {set:USABLE_FILTERED_GUARDS}, which is defined to be the subset of {FILTERED_GUARDS} where {is_reachable} is or .

We try to maintain a requirement that {USABLE_FILTERED_GUARDS} contain at least {MIN_FILTERED_SAMPLE} elements:

Whenever we are going to sample from {USABLE_FILTERED_GUARDS}, and it contains fewer than {MIN_FILTERED_SAMPLE} elements, we add new elements to {SAMPLED_GUARDS} until one of the following is true: * {USABLE_FILTERED_GUARDS} is large enough, OR * {SAMPLED_GUARDS} is at its maximum size. ** Rationale **

These filters are applied after sampling: if we applied them before the sampling, then our sample would reflect the set of filtering restrictions that we had in the past.

The confirmed-guard list. [Section:CONFIRMED]

[formerly USED_GUARDS]

We maintain a persistent ordered list, {list:CONFIRMED_GUARDS}. It contains guards that we have used before, in our preference order of using them. It is a subset of {SAMPLED_GUARDS}. For each guard in this list, we store persistently:

{pvar:IDENTITY} Its fingerprint.

- {pvar:CONFIRMED_ON_DATE} When we added this guard to {CONFIRMED_GUARDS}. Randomized to a point in the past as RAND(now, {GUARD_LIFETIME}/10).

We append new members to {CONFIRMED_GUARDS} when we mark a circuit built through a guard as "for user traffic."

Whenever we remove a member from {SAMPLED_GUARDS}, we also remove it from {CONFIRMED_GUARDS}.

[Note: You can also regard the {CONFIRMED_GUARDS} list as a total ordering defined over a subset of {SAMPLED_GUARDS}.]

Definition: we call Guard A "higher priority" than another Guard B if, when A and B are both reachable, we would rather use A. We define priority as follows:

* Every guard in {CONFIRMED_GUARDS} has a higher priority than every guard not in {CONFIRMED_GUARDS}. * Among guards in {CONFIRMED_GUARDS}, the one appearing earlier on the {CONFIRMED_GUARDS} list has a higher priority. * Among guards that do not appear in {CONFIRMED_GUARDS}, {is_pending}==true guards have higher priority. * Among those, the guard with earlier {last_tried_connect} time has higher priority. * Finally, among guards that do not appear in {CONFIRMED_GUARDS} with {is_pending==false}, all have equal priority. ** Rationale **

We add elements to this ordering when we have actually used them for building a usable circuit. We could mark them at some other time (such as when we attempt to connect to them, or when we actually connect to them), but this approach keeps us from committing to a guard before we actually use it for sensitive traffic.

The Primary guards [Section:PRIMARY]

We keep a run-time non-persistent ordered list of {list:PRIMARY_GUARDS}. It is a subset of {FILTERED_GUARDS}. It contains {N_PRIMARY_GUARDS} elements.

To compute primary guards, take the ordered intersection of {CONFIRMED_GUARDS} and {FILTERED_GUARDS}, and take the first {N_PRIMARY_GUARDS} elements. If there are fewer than {N_PRIMARY_GUARDS} elements, append additional elements to PRIMARY_GUARDS chosen from ({FILTERED_GUARDS} - {CONFIRMED_GUARDS}), ordered in "sample order" (that is, by {ADDED_ON_DATE}).

Once an element has been added to {PRIMARY_GUARDS}, we do not remove it until it is replaced by some element from {CONFIRMED_GUARDS}. That is: if a non-primary guard becomes confirmed and not every primary guard is confirmed, then the list of primary guards list is regenerated, first from the confirmed guards (as before), and then from any non-confirmed primary guards.

Note that {PRIMARY_GUARDS} do not have to be in {USABLE_FILTERED_GUARDS}: they might be unreachable.

** Rationale **

These guards are treated differently from other guards. If one of them is usable, then we use it right away. For other guards {FILTERED_GUARDS}, if it's usable, then before using it we might first double-check whether perhaps one of the primary guards is usable after all.

Retrying guards. [Section:RETRYING]

(We run this process as frequently as needed. It can be done once a second, or just-in-time.)

If a primary sampled guard's {is_reachable} status is , then we decide whether to update its {is_reachable} status to based on its {last_tried_connect} time, its {failing_since} time, and the {PRIMARY_GUARDS_RETRY_SCHED} schedule.

If a non-primary sampled guard's {is_reachable} status is , then we decide whether to update its {is_reachable} status to based on its {last_tried_connect} time, its {failing_since} time, and the {GUARDS_RETRY_SCHED} schedule.

** Rationale **

An observation that a guard has been 'unreachable' only lasts for a given amount of time, since we can't infer that it's unreachable now from the fact that it was unreachable a few minutes ago.

Selecting guards for circuits. [Section:SELECTING]

Every origin circuit is now in one of these states:

<state:usable_on_completion>, <state:usable_if_no_better_guard>, <state:waiting_for_better_guard>, or <state:complete>.

You may only attach streams to circuits. (Additionally, you may only send RENDEZVOUS cells, ESTABLISH_INTRO cells, and INTRODUCE cells on circuits.)

The per-circuit state machine is:

New circuits are <usable_on_completion> or <usable_if_no_better_guard>. A <usable_on_completion> circuit may become <complete>, or may fail. A <usable_if_no_better_guard> circuit may become <usable_on_completion>; may become <waiting_for_better_guard>; or may fail. A <waiting_for_better_guard> circuit will become <complete>, or will be closed, or will fail. A <complete> circuit remains <complete> until it fails or is closed. Each of these transitions is described below. We keep, as global transient state: * {tvar:last_time_on_internet} -- the last time at which we successfully used a circuit or connected to a guard. At startup we set this to "infinitely far in the past." When we want to build a circuit, and we need to pick a guard: * If any entry in PRIMARY_GUARDS has {is_reachable} status of <maybe> or <yes>, return one of the first {NUM_USABLE_PRIMARY_GUARDS} or {NUM_USABLE_PRIMARY_DIRECTORY_GUARDS} such guards, chosen uniformly at random. The circuit is <usable_on_completion>. [Note: We do not use {is_pending} on primary guards, since we are willing to try to build multiple circuits through them before we know for sure whether they work, and since we will not use any non-primary guards until we are sure that the primary guards are all down. (XX is this good?)] * Otherwise, if the ordered intersection of {CONFIRMED_GUARDS} and {USABLE_FILTERED_GUARDS} is nonempty, return the first entry in that intersection that has {is_pending} set to false. Set its value of {is_pending} to true, and set its {pending_since} to the current time. The circuit is now <usable_if_no_better_guard>. (If all entries have {is_pending} true, pick the first one.) * Otherwise, if there is no such entry, select a member from {USABLE_FILTERED_GUARDS} in sample order. Set its {is_pending} field to true, and set its {pending_since} to the current time. The circuit is <usable_if_no_better_guard>. * Otherwise, if USABLE_FILTERED_GUARDS is empty, we have exhausted all the sampled guards. In this case we proceed by marking all guards as <maybe> reachable so that we can keep on trying circuits.

Whenever we select a guard for a new circuit attempt, we update the {last_tried_connect} time for the guard to 'now.'

In some cases (for example, when we need a certain directory feature, or when we need to avoid using a certain exit as a guard), we need to restrict the guards that we use for a single circuit. When this happens, we remember the restrictions that applied when choosing the guard for that circuit, since we will need them later (see [UPDATE_WAITING].).

** Rationale **

We're getting to the core of the algorithm here. Our main goals are to make sure that

1. If it's possible to use a primary guard, we do. 2. We probably use the first primary guard.

So we only try non-primary guards if we're pretty sure that all the primary guards are down, and we only try a given primary guard if the earlier primary guards seem down.

When we do try non-primary guards, however, we only build one circuit through each, to give it a chance to succeed or fail. If ever such a circuit succeeds, we don't use it until we're pretty sure that it's the best guard we're getting. (see below).

[XXX timeout.]

When a circuit fails. [Section:ON_FAIL]

When a circuit fails in a way that makes us conclude that a guard is not reachable, we take the following steps:

* Set the guard's {is_reachable} status to <no>. If it had {is_pending} set to true, we make it non-pending and clear {pending_since}. * Close the circuit, of course. (This removes it from consideration by the algorithm in [UPDATE_WAITING].) * Update the list of waiting circuits. (See [UPDATE_WAITING] below.)

[Note: the existing Tor logic will cause us to create more circuits in response to some of these steps; and also see [ON_CONSENSUS].]

** Rationale **

See [SELECTING] above for rationale.

When a circuit succeeds [Section:ON_SUCCESS]

When a circuit succeeds in a way that makes us conclude that a guard was reachable, we take these steps:

* We set its {is_reachable} status to <yes>. * We set its {failing_since} to "never". * If the guard was {is_pending}, we clear the {is_pending} flag and set {pending_since} to false. * If the guard was not a member of {CONFIRMED_GUARDS}, we add it to the end of {CONFIRMED_GUARDS}. * If this circuit was <usable_on_completion>, this circuit is now <complete>. You may attach streams to this circuit, and use it for hidden services. * If this circuit was <usable_if_no_better_guard>, it is now <waiting_for_better_guard>. You may not yet attach streams to it. Then check whether the {last_time_on_internet} is more than {INTERNET_LIKELY_DOWN_INTERVAL} seconds ago: * If it is, then mark all {PRIMARY_GUARDS} as "maybe" reachable. * If it is not, update the list of waiting circuits. (See [UPDATE_WAITING] below)

[Note: the existing Tor logic will cause us to create more circuits in response to some of these steps; and see [ON_CONSENSUS].]

** Rationale **

See [SELECTING] above for rationale.

Updating the list of waiting circuits [Section:UPDATE_WAITING]

We run this procedure whenever it's possible that a <waiting_for_better_guard> circuit might be ready to be called .

* If any circuit C1 is <waiting_for_better_guard>, AND: * All primary guards have reachable status of <no>. * There is no circuit C2 that "blocks" C1. Then, upgrade C1 to <complete>. Definition: In the algorithm above, C2 "blocks" C1 if: * C2 obeys all the restrictions that C1 had to obey, AND * C2 has higher priority than C1, AND * Either C2 is <complete>, or C2 is <waiting_for_better_guard>, or C2 has been <usable_if_no_better_guard> for no more than {NONPRIMARY_GUARD_CONNECT_TIMEOUT} seconds. We run this procedure periodically: * If any circuit stays in <waiting_for_better_guard> for more than {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds, time it out. **Rationale**

If we open a connection to a guard, we might want to use it immediately (if we're sure that it's the best we can do), or we might want to wait a little while to see if some other circuit which we like better will finish.

When we mark a circuit , we don't close the lower-priority circuits immediately: we might decide to use them after all if the circuit goes down before {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds.

Without a list of waiting circuits [Section:NO_CIRCLIST]

As an alternative to the section [SECTION:UPDATE_WAITING] above, this section presents a new way to maintain guard status independently of tracking individual circuit status. This formulation gives a result equivalent or similar to the approach above, but simplifies the necessary communications between the guard and circuit subsystems.

As before, when all primary guards are Unreachable, we need to try non-primary guards. We select the first such guard (in preference order) that is neither Unreachable nor Pending. Whenever we give out such a guard, if the guard's status is Unknown, then we call that guard "Pending" with its {is_pending} flag, until the attempt to use it succeeds or fails. We remember when the guard became Pending with the {pending_since variable}.

After completing a circuit, the implementation must check whether its guard is usable. A guard's usability status may be "usable", "unusable", or "unknown". A guard is usable according to these rules:

Primary guards are always usable.

Non-primary guards are usable for a given circuit if every guard earlier in the preference list is either unsuitable for that circuit (e.g. because of family restrictions), or marked as Unreachable, or has been pending for at least {NONPRIMARY_GUARD_CONNECT_TIMEOUT}.

Non-primary guards are not usable for a given circuit if some guard earlier in the preference list is suitable for the circuit and Reachable.

Non-primary guards are unusable if they have not become usable after {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds.

If a circuit's guard is not usable or unusable immediately, the circuit is not discarded; instead, it is kept (but not used) until the guard becomes usable or unusable.

Whenever we get a new consensus. [Section:ON_CONSENSUS]

We update {GUARDS}.

For every guard in {SAMPLED_GUARDS}, we update {IS_LISTED} and {FIRST_UNLISTED_AT}.

[**] We remove entries from {SAMPLED_GUARDS} if appropriate, according to the sampled-guards expiration rules. If they were in {CONFIRMED_GUARDS}, we also remove them from {CONFIRMED_GUARDS}.

We recompute {FILTERED_GUARDS}, and everything that derives from it, including {USABLE_FILTERED_GUARDS}, and {PRIMARY_GUARDS}.

(Whenever one of the configuration options that affects the filter is updated, we repeat the process above, starting at the [**] line.)

4.11. Deciding whether to generate a new circuit. [Section:NEW_CIRCUIT_NEEDED]

We generate a new circuit when we don't have enough circuits either built or in-progress to handle a given stream, or an expected stream.

For the purpose of this rule, we say that <waiting_for_better_guard> circuits are neither built nor in-progress; that circuits are built; and that the other states are in-progress.

4.12. When we are missing descriptors. [Section:MISSING_DESCRIPTORS]

We need either a router descriptor or a microdescriptor in order to build a circuit through a guard. If we do not have such a descriptor for a guard, we can still use the guard for one-hop directory fetches, but not for longer circuits.

(Also, when we are missing descriptors for our first {NUM_USABLE_PRIMARY_GUARDS} primary guards, we don't build circuits at all until we have fetched them.)

Appendices

Acknowledgements

This research was supported in part by NSF grants CNS-1111539, CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548.

Parameters with suggested values. [Section:PARAM_VALS]

(All suggested values chosen arbitrarily)

{param:MAX_SAMPLE_THRESHOLD} -- 20%

{param:MAX_SAMPLE_SIZE} -- 60

{param:GUARD_LIFETIME} -- 120 days

{param:REMOVE_UNLISTED_GUARDS_AFTER} -- 20 days [previously ENTRY_GUARD_REMOVE_AFTER] {param:MIN_FILTERED_SAMPLE} -- 20 {param:N_PRIMARY_GUARDS} -- 3 {param:PRIMARY_GUARDS_RETRY_SCHED} We recommend the following schedule, which is the one used in Arti: -- Use the "decorrelated-jitter" algorithm from "dir-spec.txt" section 5.5 where `base_delay` is 30 seconds and `cap` is 6 hours. This legacy schedule is the one used in C tor: -- every 10 minutes for the first six hours, -- every 90 minutes for the next 90 hours, -- every 4 hours for the next 3 days, -- every 9 hours thereafter. {param:GUARDS_RETRY_SCHED} -- We recommend the following schedule, which is the one used in Arti: -- Use the "decorrelated-jitter" algorithm from "dir-spec.txt" section 5.5 where `base_delay` is 10 minutes and `cap` is 36 hours. This legacy schedule is the one used in C tor: -- every hour for the first six hours, -- every 4 hours for the 90 hours, -- every 18 hours for the next 3 days, -- every 36 hours thereafter. {param:INTERNET_LIKELY_DOWN_INTERVAL} -- 10 minutes {param:NONPRIMARY_GUARD_CONNECT_TIMEOUT} -- 15 seconds {param:NONPRIMARY_GUARD_IDLE_TIMEOUT} -- 10 minutes {param:MEANINGFUL_RESTRICTION_FRAC} -- .2 {param:EXTREME_RESTRICTION_FRAC} -- .01 {param:GUARD_CONFIRMED_MIN_LIFETIME} -- 60 days {param:NUM_USABLE_PRIMARY_GUARDS} -- 1 {param:NUM_USABLE_PRIMARY_DIRECTORY_GUARDS} -- 3

Random values [Section:RANDOM]

Frequently, we want to randomize the expiration time of something so that it's not easy for an observer to match it to its start time. We do this by randomizing its start date a little, so that we only need to remember a fixed expiration interval.

By RAND(now, INTERVAL) we mean a time between now and INTERVAL in the past, chosen uniformly at random.

Why not a sliding scale of primaryness? [Section:CVP]

At one meeting, I floated the idea of having "primaryness" be a continuous variable rather than a boolean.

I'm no longer sure this is a great idea, but I'll try to outline how it might work.

To begin with: being "primary" gives it a few different traits:

We retry primary guards more frequently. [Section:RETRYING]

2) We don't even _try_ building circuits through lower-priority guards until we're pretty sure that the higher-priority primary guards are down. (With non-primary guards, on the other hand, we launch exploratory circuits which we plan not to use if higher-priority guards succeed.) [Section:SELECTING] 3) We retry them all one more time if a circuit succeeds after the net has been down for a while. [Section:ON_SUCCESS] We could make each of the above traits continuous: 1) We could make the interval at which a guard is retried depend continuously on its position in CONFIRMED_GUARDS. 2) We could change the number of guards we test in parallel based on their position in CONFIRMED_GUARDS. 3) We could change the rule for how long the higher-priority guards need to have been down before we call a <usable_if_no_better_guard> circuit <complete> based on a possible network-down condition. For example, we could retry the first guard if we tried it more than 10 seconds ago, the second if we tried it more than 20 seconds ago, etc.

I am pretty sure, however, that if these are worth doing, they need more analysis! Here's why:

* They all have the potential to leak more information about a guard's exact position on the list. Is that safe? Is there any way to exploit that? I don't think we know. * They all seem like changes which it would be relatively simple to make to the code after we implement the simpler version of the algorithm described above.

Controller changes

We will add to control-spec.txt a new possible circuit state, GUARD_WAIT, that can be given as part of circuit events and GETINFO responses about circuits. A circuit is in the GUARD_WAIT state when it is fully built, but we will not use it because a circuit with a better guard might become built too.

Persistent state format

The persistent state format doesn't need to be part of this specification, since different implementations can do it differently. Nonetheless, here's the one Tor uses:

The "state" file contains one Guard entry for each sampled guard in each instance of the guard state (see section 2). The value of this Guard entry is a set of space-separated K=V entries, where K contains any nonspace character except =, and V contains any nonspace characters.

Implementations must retain any unrecognized K=V entries for a sampled guard when they regenerate the state file.

The order of K=V entries is not allowed to matter.

Recognized fields (values of K) are:

"in" -- the name of the guard state instance that this sampled guard is in. If a sampled guard is in two guard states instances, it appears twice, with a different "in" field each time. Required. "rsa_id" -- the RSA id digest for this guard, encoded in hex. Required. "bridge_addr" -- If the guard is a bridge, its configured address and port (this can be the ORPort or a pluggable transport port). Optional. "nickname" -- the guard's nickname, if any. Optional. "sampled_on" -- the date when the guard was sampled. Required. "sampled_by" -- the Tor version that sampled this guard. Optional. "unlisted_since" -- the date since which the guard has been unlisted. Optional. "listed" -- 0 if the guard is not listed; 1 if it is. Required. "confirmed_on" -- date when the guard was confirmed. Optional. "confirmed_idx" -- position of the guard in the confirmed list. Optional. "pb_use_attempts", "pb_use_successes", "pb_circ_attempts", "pb_circ_successes", "pb_successful_circuits_closed", "pb_collapsed_circuits", "pb_unusable_circuits", "pb_timeouts" -- state for the circuit path bias algorithm, given in decimal fractions. Optional.

All dates here are given as a (spaceless) ISO8601 combined date and time in UTC (e.g., 2016-11-29T19:39:31).

Still non-addressed issues [Section:TODO]

Simulate to answer: Will this work in a dystopic world?

Simulate actual behavior.

For all lifetimes: instead of storing the "this began at" time, store the "remove this at" time, slightly randomized.

Clarify that when you get a circuit, you might need to relaunch circuits through that same guard immediately, if they are circuits that have to be independent.

Fix all items marked XX or TODO.

"Directory guards" -- do they matter?

Suggestion: require that all guards support downloads via BEGINDIR. We don't need to worry about directory guards for relays, since we aren't trying to prevent relay enumeration. IP version preferences via ClientPreferIPv6ORPort Suggestion: Treat it as a preference when adding to {CONFIRMED_GUARDS}, but not otherwise.

Tor Padding Specification

Mike Perry, George Kadianakis

Note: This is an attempt to specify Tor as currently implemented. Future versions of Tor will implement improved algorithms.

This document tries to cover how Tor chooses to use cover traffic to obscure various traffic patterns from external and internal observers. Other implementations MAY take other approaches, but implementors should be aware of the anonymity and load-balancing implications of their choices.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Table of Contents 1. Overview 2. Connection-level padding 2.1. Background 2.2. Implementation 2.3. Padding Cell Timeout Distribution Statistics 2.4. Maximum overhead bounds 2.5. Reducing or Disabling Padding via Negotiation 2.6. Consensus Parameters Governing Behavior 3. Circuit-level padding 3.1. Circuit Padding Negotiation 3.2. Circuit Padding Machine Message Management 3.3. Obfuscating client-side onion service circuit setup 3.3.1. Common general circuit construction sequences 3.3.2. Client-side onion service introduction circuit obfuscation 3.3.3. Client-side rendezvous circuit hiding 3.3.4. Circuit setup machine overhead 3.4. Circuit padding consensus parameters A. Acknowledgments

Overview

Tor supports two classes of cover traffic: connection-level padding, and circuit-level padding.

Connection-level padding uses the CELL_PADDING cell command for cover traffic, where as circuit-level padding uses the RELAY_COMMAND_DROP relay command. CELL_PADDING is single-hop only and can be differentiated from normal traffic by Tor relays ("internal" observers), but not by entities monitoring Tor OR connections ("external" observers).

RELAY_COMMAND_DROP is multi-hop, and is not visible to intermediate Tor relays, because the relay command field is covered by circuit layer encryption. Moreover, Tor's 'recognized' field allows RELAY_COMMAND_DROP padding to be sent to any intermediate node in a circuit (as per Section 6.1 of tor-spec.txt).

Tor uses both connection level and circuit level padding. Connection level padding is described in section 2. Circuit level padding is described in section 3.

The circuit-level padding system is completely orthogonal to the connection-level padding. The connection-level padding system regards circuit-level padding as normal data traffic, and hence the connection-level padding system will not add any additional overhead while the circuit-level padding system is actively padding.

Connection-level padding

Background

Tor clients and relays make use of CELL_PADDING to reduce the resolution of connection-level metadata retention by ISPs and surveillance infrastructure.

Such metadata retention is implemented by Internet routers in the form of Netflow, jFlow, Netstream, or IPFIX records. These records are emitted by gateway routers in a raw form and then exported (often over plaintext) to a "collector" that either records them verbatim, or reduces their granularity further[1].

Netflow records and the associated data collection and retention tools are very configurable, and have many modes of operation, especially when configured to handle high throughput. However, at ISP scale, per-flow records are very likely to be employed, since they are the default, and also provide very high resolution in terms of endpoint activity, second only to full packet and/or header capture.

Per-flow records record the endpoint connection 5-tuple, as well as the total number of bytes sent and received by that 5-tuple during a particular time period. They can store additional fields as well, but it is primarily timing and bytecount information that concern us.

When configured to provide per-flow data, routers emit these raw flow records periodically for all active connections passing through them based on two parameters: the "active flow timeout" and the "inactive flow timeout".

The "active flow timeout" causes the router to emit a new record periodically for every active TCP session that continuously sends data. The default active flow timeout for most routers is 30 minutes, meaning that a new record is created for every TCP session at least every 30 minutes, no matter what. This value can be configured from 1 minute to 60 minutes on major routers.

The "inactive flow timeout" is used by routers to create a new record if a TCP session is inactive for some number of seconds. It allows routers to avoid the need to track a large number of idle connections in memory, and instead emit a separate record only when there is activity. This value ranges from 10 seconds to 600 seconds on common routers. It appears as though no routers support a value lower than 10 seconds.

For reference, here are default values and ranges (in parenthesis when known) for common routers, along with citations to their manuals.

Some routers speak other collection protocols than Netflow, and in the case of Juniper, use different timeouts for these protocols. Where this is known to happen, it has been noted.

Inactive Timeout Active Timeout Cisco IOS[3] 15s (10-600s) 30min (1-60min) Cisco Catalyst[4] 5min 32min Juniper (jFlow)[5] 15s (10-600s) 30min (1-60min) Juniper (Netflow)[6,7] 60s (10-600s) 30min (1-30min) H3C (Netstream)[8] 60s (60-600s) 30min (1-60min) Fortinet[9] 15s 30min MicroTik[10] 15s 30min nProbe[14] 30s 120s Alcatel-Lucent[2] 15s (10-600s) 30min (1-600min)

The combination of the active and inactive netflow record timeouts allow us to devise a low-cost padding defense that causes what would otherwise be split records to "collapse" at the router even before they are exported to the collector for storage. So long as a connection transmits data before the "inactive flow timeout" expires, then the router will continue to count the total bytes on that flow before finally emitting a record at the "active flow timeout".

This means that for a minimal amount of padding that prevents the "inactive flow timeout" from expiring, it is possible to reduce the resolution of raw per-flow netflow data to the total amount of bytes send and received in a 30 minute window. This is a vast reduction in resolution for HTTP, IRC, XMPP, SSH, and other intermittent interactive traffic, especially when all user traffic in that time period is multiplexed over a single connection (as it is with Tor).

Though flow measurement in principle can be bidirectional (counting cells sent in both directions between a pair of IPs) or unidirectional (counting only cells sent from one IP to another), we assume for safety that all measurement is unidirectional, and so traffic must be sent by both parties in order to prevent record splitting.

Implementation

Tor clients currently maintain one TLS connection to their Guard node to carry actual application traffic, and make up to 3 additional connections to other nodes to retrieve directory information.

We pad only the client's connection to the Guard node, and not any other connection. We treat Bridge node connections to the Tor network as client connections, and pad them, but otherwise not pad between normal relays.

Both clients and Guards will maintain a timer for all application (ie: non-directory) TLS connections. Every time a padding packet sent by an endpoint, that endpoint will sample a timeout value from the max(X,X) distribution described in Section 2.3. The default range is from 1.5 seconds to 9.5 seconds time range, subject to consensus parameters as specified in Section 2.6.

(The timing is randomized to avoid making it obvious which cells are padding.)

If another cell is sent for any reason before this timer expires, the timer is reset to a new random value.

If the connection remains inactive until the timer expires, a single CELL_PADDING cell will be sent on that connection (which will also start a new timer).

In this way, the connection will only be padded in a given direction in the event that it is idle in that direction, and will always transmit a packet before the minimum 10 second inactive timeout.

(In practice, an implementation may not be able to determine when, exactly, a cell is sent on a given channel. For example, even though the cell has been given to the kernel via a call to send(2), the kernel may still be buffering that cell. In cases such as these, implementations should use a reasonable proxy for the time at which a cell is sent: for example, when the cell is queued. If this strategy is used, implementations should try to observe the innermost (closest to the wire) queue that they practically can, and if this queue is already nonempty, padding should not be scheduled until after the queue does become empty.)

Padding Cell Timeout Distribution Statistics

To limit the amount of padding sent, instead of sampling each endpoint timeout uniformly, we instead sample it from max(X,X), where X is uniformly distributed.

If X is a random variable uniform from 0..R-1 (where R=high-low), then the random variable Y = max(X,X) has Prob(Y == i) = (2.0i + 1)/(RR).

Then, when both sides apply timeouts sampled from Y, the resulting bidirectional padding packet rate is now a third random variable: Z = min(Y,Y).

The distribution of Z is slightly bell-shaped, but mostly flat around the mean. It also turns out that Exp[Z] ~= Exp[X]. Here's a table of average values for each random variable:

R Exp[X] Exp[Z] Exp[min(X,X)] Exp[Y=max(X,X)] 2000 999.5 1066 666.2 1332.8 3000 1499.5 1599.5 999.5 1999.5 5000 2499.5 2666 1666.2 3332.8 6000 2999.5 3199.5 1999.5 3999.5 7000 3499.5 3732.8 2332.8 4666.2 8000 3999.5 4266.2 2666.2 5332.8 10000 4999.5 5328 3332.8 6666.2 15000 7499.5 7995 4999.5 9999.5 20000 9900.5 10661 6666.2 13332.8

Maximum overhead bounds

With the default parameters and the above distribution, we expect a padded connection to send one padding cell every 5.5 seconds. This averages to 103 bytes per second full duplex (~52 bytes/sec in each direction), assuming a 512 byte cell and 55 bytes of TLS+TCP+IP headers. For a client connection that remains otherwise idle for its expected ~50 minute lifespan (governed by the circuit available timeout plus a small additional connection timeout), this is about 154.5KB of overhead in each direction (309KB total).

With 2.5M completely idle clients connected simultaneously, 52 bytes per second amounts to 130MB/second in each direction network-wide, which is roughly the current amount of Tor directory traffic[11]. Of course, our 2.5M daily users will neither be connected simultaneously, nor entirely idle, so we expect the actual overhead to be much lower than this.

Reducing or Disabling Padding via Negotiation

To allow mobile clients to either disable or reduce their padding overhead, the CELL_PADDING_NEGOTIATE cell (tor-spec.txt section 7.2) may be sent from clients to relays. This cell is used to instruct relays to cease sending padding.

If the client has opted to use reduced padding, it continues to send padding cells sampled from the range [9000,14000] milliseconds (subject to consensus parameter alteration as per Section 2.6), still using the Y=max(X,X) distribution. Since the padding is now unidirectional, the expected frequency of padding cells is now governed by the Y distribution above as opposed to Z. For a range of 5000ms, we can see that we expect to send a padding packet every 9000+3332.8 = 12332.8ms. We also half the circuit available timeout from ~50min down to ~25min, which causes the client's OR connections to be closed shortly there after when it is idle, thus reducing overhead.

These two changes cause the padding overhead to go from 309KB per one-time-use Tor connection down to 69KB per one-time-use Tor connection. For continual usage, the maximum overhead goes from 103 bytes/sec down to 46 bytes/sec.

If a client opts to completely disable padding, it sends a CELL_PADDING_NEGOTIATE to instruct the relay not to pad, and then does not send any further padding itself.

Currently, clients negotiate padding only when a channel is created, immediately after sending their NETINFO cell. Recipients SHOULD, however, accept padding negotiation messages at any time.

If a client which previously negotiated reduced, or disabled, padding, and wishes to re-enable default padding (ie padding according to the consensus parameters), it SHOULD send CELL_PADDING_NEGOTIATE START with zero in the ito_low_ms and ito_high_ms fields. (It therefore SHOULD NOT copy the values from its own established consensus into the CELL_PADDING_NEGOTIATE cell.) This avoids the client needing to send updated padding negotiations if the consensus parameters should change. The recipient's clamping of the timing parameters will cause the recipient to use its notion of the consensus parameters.

Clients and bridges MUST reject padding negotiation messages from relays, and close the channel if they receive one.

Consensus Parameters Governing Behavior

Connection-level padding is controlled by the following consensus parameters:

* nf_ito_low - The low end of the range to send padding when inactive, in ms. - Default: 1500 * nf_ito_high - The high end of the range to send padding, in ms. - Default: 9500 - If nf_ito_low == nf_ito_high == 0, padding will be disabled. * nf_ito_low_reduced - For reduced padding clients: the low end of the range to send padding when inactive, in ms. - Default: 9000 * nf_ito_high_reduced - For reduced padding clients: the high end of the range to send padding, in ms. - Default: 14000 * nf_conntimeout_clients - The number of seconds to keep never-used circuits opened and available for clients to use. Note that the actual client timeout is randomized uniformly from this value to twice this value. - The number of seconds to keep idle (not currently used) canonical channels are open and available. (We do this to ensure a sufficient time duration of padding, which is the ultimate goal.) - This value is also used to determine how long, after a port has been used, we should attempt to keep building predicted circuits for that port. (See path-spec.txt section 2.1.1.) This behavior was originally added to work around implementation limitations, but it serves as a reasonable default regardless of implementation. - For all use cases, reduced padding clients use half the consensus value. - Implementations MAY mark circuits held open past the reduced padding quantity (half the consensus value) as "not to be used for streams", to prevent their use from becoming a distinguisher. - Default: 1800 * nf_pad_before_usage - If set to 1, OR connections are padded before the client uses them for any application traffic. If 0, OR connections are not padded until application data begins. - Default: 1 * nf_pad_relays - If set to 1, we also pad inactive relay-to-relay connections - Default: 0 * nf_conntimeout_relays - The number of seconds that idle relay-to-relay connections are kept open. - Default: 3600

Circuit-level padding

The circuit padding system in Tor is an extension of the WTF-PAD event-driven state machine design[15]. At a high level, this design places one or more padding state machines at the client, and one or more padding state machines at a relay, on each circuit.

State transition and histogram generation has been generalized to be fully programmable, and probability distribution support was added to support more compact representations like APE[16]. Additionally, packet count limits, rate limiting, and circuit application conditions have been added.

At present, Tor uses this system to deploy two pairs of circuit padding machines, to obscure differences between the setup phase of client-side onion service circuits, up to the first 10 cells.

This specification covers only the resulting behavior of these padding machines, and thus does not cover the state machine implementation details or operation. For full details on using the circuit padding system to develop future padding defenses, see the research developer documentation[17].

Circuit Padding Negotiation

Circuit padding machines are advertised as "Padding" subprotocol versions (see tor-spec.txt Section 9). The onion service circuit padding machines are advertised as "Padding=2".

Because circuit padding machines only become active at certain points in circuit lifetime, and because more than one padding machine may be active at any given point in circuit lifetime, there is also a padding negotiation cell and a negotiated response. These are relay commands 41 and 42, with relay headers as per section 6.1 of tor-spec.txt.

The fields of the relay cell Data payload of a negotiate request are as follows:

const CIRCPAD_COMMAND_STOP = 1; const CIRCPAD_COMMAND_START = 2; const CIRCPAD_RESPONSE_OK = 1; const CIRCPAD_RESPONSE_ERR = 2; const CIRCPAD_MACHINE_CIRC_SETUP = 1; struct circpad_negotiate { u8 version IN [0]; u8 command IN [CIRCPAD_COMMAND_START, CIRCPAD_COMMAND_STOP]; u8 machine_type IN [CIRCPAD_MACHINE_CIRC_SETUP]; u8 unused; // Formerly echo_request u32 machine_ctr; };

When a client wants to start a circuit padding machine, it first checks that the desired destination hop advertises the appropriate subprotocol version for that machine. It then sends a circpad_negotiate cell to that hop with command=CIRCPAD_COMMAND_START, and machine_type=CIRCPAD_MACHINE_CIRC_SETUP (for the circ setup machine, the destination hop is the second hop in the circuit). The machine_ctr is the count of which machine instance this is on the circuit. It is used to disambiguate shutdown requests.

When a relay receives a circpad_negotiate cell, it checks that it supports the requested machine, and sends a circpad_negotiated cell, which is formatted in the data payload of a relay cell with command number 42 (see tor-spec.txt section 6.1), as follows:

struct circpad_negotiated { u8 version IN [0]; u8 command IN [CIRCPAD_COMMAND_START, CIRCPAD_COMMAND_STOP]; u8 response IN [CIRCPAD_RESPONSE_OK, CIRCPAD_RESPONSE_ERR]; u8 machine_type IN [CIRCPAD_MACHINE_CIRC_SETUP]; u32 machine_ctr; };

If the machine is supported, the response field will contain CIRCPAD_RESPONSE_OK. If it is not, it will contain CIRCPAD_RESPONSE_ERR.

Either side may send a CIRCPAD_COMMAND_STOP to shut down the padding machines (clients MUST only send circpad_negotiate, and relays MUST only send circpad_negotiated for this purpose).

If the machine_ctr does not match the current machine instance count on the circuit, the command is ignored.

Circuit Padding Machine Message Management

Clients MAY send padding cells towards the relay before receiving the circpad_negotiated response, to allow for outbound cover traffic before negotiation completes.

Clients MAY send another circpad_negotiate cell before receiving the circpad_negotiated response, to allow for rapid machine changes.

Relays MUST NOT send padding cells or circpad_negotiated cells, unless a padding machine is active. Any padding-related cells that arrive at the client from unexpected relay sources are protocol violations, and clients MAY immediately tear down such circuits to avoid side channel risk.

Obfuscating client-side onion service circuit setup

The circuit padding currently deployed in Tor attempts to hide client-side onion service circuit setup. Service-side setup is not covered, because doing so would involve significantly more overhead, and/or require interaction with the application layer.

The approach taken aims to make client-side introduction and rendezvous circuits match the cell direction sequence and cell count of 3 hop general circuits used for normal web traffic, for the first 10 cells only. The lifespan of introduction circuits is also made to match the lifespan of general circuits.

Note that inter-arrival timing is not obfuscated by this defense.

Common general circuit construction sequences

Most general Tor circuits used to surf the web or download directory information start with the following 6-cell relay cell sequence (cells surrounded in [brackets] are outgoing, the others are incoming):

[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED

When this is done, the client has established a 3-hop circuit and also opened a stream to the other end. Usually after this comes a series of DATA cell that either fetches pages, establishes an SSL connection or fetches directory information:

[DATA] -> [DATA] -> DATA -> DATA...(inbound cells continue)

The above stream of 10 relay cells defines the grand majority of general circuits that come out of Tor browser during our testing, and it's what we use to make introduction and rendezvous circuits blend in.

Please note that in this section we only investigate relay cells and not connection-level cells like CREATE/CREATED or AUTHENTICATE/etc. that are used during the link-layer handshake. The rationale is that connection-level cells depend on the type of guard used and are not an effective fingerprint for a network/guard-level adversary.

Client-side onion service introduction circuit obfuscation

Two circuit padding machines work to hide client-side introduction circuits: one machine at the origin, and one machine at the second hop of the circuit. Each machine sends padding towards the other. The padding from the origin-side machine terminates at the second hop and does not get forwarded to the actual introduction point.

From Section 3.3.1 above, most general circuits have the following initial relay cell sequence (outgoing cells marked in [brackets]):

[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED -> [DATA] -> [DATA] -> DATA -> DATA...(inbound data cells continue) Whereas normal introduction circuits usually look like: [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRO1] -> INTRODUCE_ACK

This means that up to the sixth cell (first line of each sequence above), both general and intro circuits have identical cell sequences. After that we want to mimic the second line sequence of

-> [DATA] -> [DATA] -> DATA -> DATA...(inbound data cells continue)

We achieve this by starting padding INTRODUCE1 has been sent. With padding negotiation cells, in the common case of the second line looks like:

-> [INTRO1] -> [PADDING_NEGOTIATE] -> PADDING_NEGOTIATED -> INTRO_ACK

Then, the middle node will send between INTRO_MACHINE_MINIMUM_PADDING (7) and INTRO_MACHINE_MAXIMUM_PADDING (10) cells, to match the "...(inbound data cells continue)" portion of the trace (aka the rest of an HTTPS response body).

We also set a special flag which keeps the circuit open even after the introduction is performed. With this feature the circuit will stay alive for the same duration as normal web circuits before they expire (usually 10 minutes).

Client-side rendezvous circuit hiding

Following a similar argument as for intro circuits, we are aiming for padded rendezvous circuits to blend in with the initial cell sequence of general circuits which usually look like this:

[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED -> [DATA] -> [DATA] -> DATA -> DATA...(incoming cells continue) Whereas normal rendezvous circuits usually look like: [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EST_REND] -> REND_EST -> REND2 -> [BEGIN]

This means that up to the sixth cell (the first line), both general and rend circuits have identical cell sequences.

After that we want to mimic a [DATA] -> [DATA] -> DATA -> DATA sequence.

With padding negotiation right after the REND_ESTABLISHED, the sequence becomes:

[EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EST_REND] -> REND_EST -> [PADDING_NEGOTIATE] -> [DROP] -> PADDING_NEGOTIATED -> DROP... After which normal application DATA cells continue on the circuit.

Hence this way we make rendezvous circuits look like general circuits up till the end of the circuit setup.

After that our machine gets deactivated, and we let the actual rendezvous circuit shape the traffic flow. Since rendezvous circuits usually imitate general circuits (their purpose is to surf the web), we can expect that they will look alike.

Circuit setup machine overhead

For the intro circuit case, we see that the origin-side machine just sends a single [PADDING_NEGOTIATE] cell, whereas the origin-side machine sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means that the average overhead of this machine is 11 padding cells per introduction circuit.

For the rend circuit case, this machine is quite light. Both sides send 2 padding cells, for a total of 4 padding cells.

Circuit padding consensus parameters

The circuit padding system has a handful of consensus parameters that can either disable circuit padding entirely, or rate limit the total overhead at relays and clients.

* circpad_padding_disabled - If set to 1, no circuit padding machines will negotiate, and all current padding machines will cease padding immediately. - Default: 0 * circpad_padding_reduced - If set to 1, only circuit padding machines marked as "reduced"/"low overhead" will be used. (Currently no such machines are marked as "reduced overhead"). - Default: 0 * circpad_global_allowed_cells - This is the number of padding cells that must be sent before the 'circpad_global_max_padding_percent' parameter is applied. - Default: 0 * circpad_global_max_padding_percent - This is the maximum ratio of padding cells to total cells, specified as a percent. If the global ratio of padding cells to total cells across all circuits exceeds this percent value, no more padding is sent until the ratio becomes lower. 0 means no limit. - Default: 0 * circpad_max_circ_queued_cells - This is the maximum number of cells that can be in the circuitmux queue before padding stops being sent on that circuit. - Default: CIRCWINDOW_START_MAX (1000)

Acknowledgments

This research was supported in part by NSF grants CNS-1111539, CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548.

1. https://en.wikipedia.org/wiki/NetFlow 2. http://infodoc.alcatel-lucent.com/html/0_add-h-f/93-0073-10-01/7750_SR_OS_Router_Configuration_Guide/Cflowd-CLI.html 3. http://www.cisco.com/en/US/docs/ios/12_3t/netflow/command/reference/nfl_a1gt_ps5207_TSD_Products_Command_Reference_Chapter.html#wp1185203 4. http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/70974-netflow-catalyst6500.html#opconf 5. https://www.juniper.net/techpubs/software/erx/junose60/swconfig-routing-vol1/html/ip-jflow-stats-config4.html#560916 6. http://www.jnpr.net/techpubs/en_US/junos15.1/topics/reference/configuration-statement/flow-active-timeout-edit-forwarding-options-po.html 7. http://www.jnpr.net/techpubs/en_US/junos15.1/topics/reference/configuration-statement/flow-active-timeout-edit-forwarding-options-po.html 8. http://www.h3c.com/portal/Technical_Support___Documents/Technical_Documents/Switches/H3C_S9500_Series_Switches/Command/Command/H3C_S9500_CM-Release1648%5Bv1.24%5D-System_Volume/200901/624854_1285_0.htm#_Toc217704193 9. http://docs-legacy.fortinet.com/fgt/handbook/cli52_html/FortiOS%205.2%20CLI/config_system.23.046.html 10. http://wiki.mikrotik.com/wiki/Manual:IP/Traffic_Flow 11. https://metrics.torproject.org/dirbytes.html 12. http://freehaven.net/anonbib/cache/murdoch-pet2007.pdf 13. https://gitweb.torproject.org/torspec.git/tree/proposals/188-bridge-guards.txt 14. http://www.ntop.org/wp-content/uploads/2013/03/nProbe_UserGuide.pdf 15. http://arxiv.org/pdf/1512.00524 16. https://www.cs.kau.se/pulls/hot/thebasketcase-ape/ 17. https://github.com/torproject/tor/tree/master/doc/HACKING/CircuitPaddingDevelopment.md 18. https://www.usenix.org/node/190967 https://blog.torproject.org/technical-summary-usenix-fingerprinting-paper

Denial-of-service prevention mechanisms in Tor

This document is incomplete; it describes some mechanisms that Tor uses to avoid different kinds of denial-of-service attacks.

Handling low-memory conditions

(See also tor-spec.txt, section 8.1.)

The Tor protocol requires clients, onion services, relays, and authorities to store various kind of information in buffers and caches. But an attacker can use these buffers and queues to queues to exhaust the memory of the a targeted Tor process, and force the operating system to kill that process.

Worse still, the ability to kill targeted Tor instances can be used to facilitate traffic analysis. (For example, see the "Sniper Attack" paper by Jansen, Tschorsch, Johnson, and Scheuermann.

With this in mind, any Tor implementation—especially one that runs as a relay or onion service—must take steps to prevent memory-based denial-of-service attacks.

Detecting low memory

The easiest way to notice you're out of memory would, in theory, be getting an error when you try to allocate more. Unfortunately, some systems (e.g. Linux) won't actually give you an "out of memory" error when you're low on memory. Instead, they overcommit and promise you memory that they can't actually provide… and then later on, they might kill processes that actually try to use more memory than they wish they'd given out.

So in practice, the mainline Tor implementation uses a different strategy. It uses a self-imposed "MaxMemInQueues" value as an upper bound for how much memory it's willing to allocate to certain kinds of queued usages. This value can either be set by the user, or derived from a fraction of the total amount of system RAM.

As of Tor 0.4.7.x, the MaxMemInQueues mechanism tracks the following kinds of allocation:

Cells queued on circuits.

Per-connection read or write buffers.

On-the-fly compression or decompression state.

Half-open stream records.

Cached onion service descriptors (hsdir only).

Cached DNS resolves (relay only).

GEOIP-based usage activity statistics.

Note that directory caches aren't counted, since those are stored on disk and accessed via mmap.

Responding to low memory

If our allocations exceed MaxMemInQueues, then we take the following steps to reduce our memory allocation.

Freeing from caches: For each of our onion service descriptor cache, our DNS cache, and our GEOIP statistics cache, we check whether they account for greater than 20% of our total allocation. If they do, we free memory from the offending cache until the total remaining is no more than 10% of our total allocation.

When freeing entries from a cache, we aim to free (approximately) the oldest entries first.

Freeing from buffers: After freeing data from caches, we see whether allocations are still above 90% of MaxMemInQueues. If they are, we try to close circuits and connections until we are below 90% of MaxMemInQueues.

When deciding to what circuits to free, we sort them based on the age of the oldest data in their queues, and free the ones with the oldest data. (For example, a circuit on which a single cell has been queued for 5 minutes would be freed before a circuit where 100 cells have been queued for 5 seconds.) "Data queued on a circuit" includes all data that we could drop if the circuit were destroyed: not only the cells on the circuit's cell queue, but also any bytes queued in buffers associated with streams or half-stream records attached to the circuit.

We free non-tunneled directory connections according to a similar rule, according to the age of their oldest queued data.

Upon freeing a circuit, a "DESTROY cell" must be sent in both directions.

Reporting low memory.

We define a "low threshold" equal to 3/4 of MaxMemInQueues. Every time our memory usage is above the low threshold, we record ourselves as being "under memory pressure".

(This is not currently reported.)

Tor's extensions to the SOCKS protocol

Table of Contents

1. Overview 1.1. Extent of support 2. Name lookup 3. Other command extensions. 4. HTTP-resistance 5. Optimistic data 6. Extended error codes

Overview

The SOCKS protocol provides a generic interface for TCP proxies. Client software connects to a SOCKS server via TCP, and requests a TCP connection to another address and port. The SOCKS server establishes the connection, and reports success or failure to the client. After the connection has been established, the client application uses the TCP stream as usual.

Tor supports SOCKS4 as defined in [1], SOCKS4A as defined in [2], and SOCKS5 as defined in [3] and [4].

The stickiest issue for Tor in supporting clients, in practice, is forcing DNS lookups to occur at the OR side: if clients do their own DNS lookup, the DNS server can learn which addresses the client wants to reach. SOCKS4 supports addressing by IPv4 address; SOCKS4A is a kludge on top of SOCKS4 to allow addressing by hostname; SOCKS5 supports IPv4, IPv6, and hostnames.

Extent of support

Tor supports the SOCKS4, SOCKS4A, and SOCKS5 standards, except as follows:

BOTH:

The BIND command is not supported.

SOCKS4,4A:

SOCKS4 usernames are used to implement stream isolation.

SOCKS5: - The (SOCKS5) "UDP ASSOCIATE" command is not supported. - SOCKS5 BIND command is not supported. - IPv6 is not supported in CONNECT commands. - SOCKS5 GSSAPI subnegotiation is not supported. - The "NO AUTHENTICATION REQUIRED" (SOCKS5) authentication method [00] is supported; and as of Tor 0.2.3.2-alpha, the "USERNAME/PASSWORD" (SOCKS5) authentication method [02] is supported too, and used as a method to implement stream isolation. As an extension to support some broken clients, we allow clients to pass "USERNAME/PASSWORD" authentication message to us even if no authentication was selected. Furthermore, we allow username/password fields of this message to be empty. This technically violates RFC1929 [4], but ensures interoperability with somewhat broken SOCKS5 client implementations. - Custom reply error code. The "REP" fields, as per the RFC[3], has unassigned values which are used to describe Tor internal errors. See ExtendedErrors in the tor.1 man page for more details. It is only sent back if this SocksPort flag is set.

(For more information on stream isolation, see IsolateSOCKSAuth on the Tor manpage.)

Name lookup

As an extension to SOCKS4A and SOCKS5, Tor implements a new command value, "RESOLVE" [F0]. When Tor receives a "RESOLVE" SOCKS command, it initiates a remote lookup of the hostname provided as the target address in the SOCKS request. The reply is either an error (if the address couldn't be resolved) or a success response. In the case of success, the address is stored in the portion of the SOCKS response reserved for remote IP address.

(We support RESOLVE in SOCKS4 too, even though it is unnecessary.)

For SOCKS5 only, we support reverse resolution with a new command value, "RESOLVE_PTR" [F1]. In response to a "RESOLVE_PTR" SOCKS5 command with an IPv4 address as its target, Tor attempts to find the canonical hostname for that IPv4 record, and returns it in the "server bound address" portion of the reply. (This command was not supported before Tor 0.1.2.2-alpha.)

Other command extensions.

Tor 0.1.2.4-alpha added a new command value: "CONNECT_DIR" [F2]. In this case, Tor will open an encrypted direct TCP connection to the directory port of the Tor server specified by address:port (the port specified should be the ORPort of the server). It uses a one-hop tunnel and a "BEGIN_DIR" relay cell to accomplish this secure connection.

The F2 command value was removed in Tor 0.2.0.10-alpha in favor of a new use_begindir flag in edge_connection_t.

HTTP-resistance

Tor checks the first byte of each SOCKS request to see whether it looks more like an HTTP request (that is, it starts with a "G", "H", or "P"). If so, Tor returns a small webpage, telling the user that his/her browser is misconfigured. This is helpful for the many users who mistakenly try to use Tor as an HTTP proxy instead of a SOCKS proxy.

Optimistic data

Tor allows SOCKS clients to send connection data before Tor has sent a SOCKS response. When using an exit node that supports "optimistic data", Tor will send such data to the server without waiting to see whether the connection attempt succeeds. This behavior can save a single round-trip time when starting connections with a protocol where the client speaks first (like HTTP). Clients that do this must be ready to hear that their connection has succeeded or failed after they have sent the data.

Extended error codes

We define a set of additional extension error codes that can be returned by our SOCKS implementation in response to failed onion service connections.

(In the C Tor implementation, these error codes can be disabled via the ExtendedErrors flag. In Arti, these error codes are enabled whenever onion services are.)

X'F0' Onion Service Descriptor Can Not be Found

The requested onion service descriptor can't be found on the hashring and thus not reachable by the client. * X'F1' Onion Service Descriptor Is Invalid The requested onion service descriptor can't be parsed or signature validation failed. * X'F2' Onion Service Introduction Failed Client failed to introduce to the service meaning the descriptor was found but the service is not anymore at the introduction points. The service has likely changed its descriptor or is not running. * X'F3' Onion Service Rendezvous Failed Client failed to rendezvous with the service which means that the client is unable to finalize the connection. * X'F4' Onion Service Missing Client Authorization Tor was able to download the requested onion service descriptor but is unable to decrypt its content because it is missing client authorization information for it. * X'F5' Onion Service Wrong Client Authorization Tor was able to download the requested onion service descriptor but is unable to decrypt its content using the client authorization information it has. This means the client access were revoked. * X'F6' Onion Service Invalid Address The given .onion address is invalid. In one of these cases this error is returned: address checksum doesn't match, ed25519 public key is invalid or the encoding is invalid. * X'F7' Onion Service Introduction Timed Out Similar to X'F2' code but in this case, all introduction attempts have failed due to a time out.

(Note that not all of the above error codes are currently returned by Arti as of August 2023.)

References: [1] http://en.wikipedia.org/wiki/SOCKS#SOCKS4 [2] http://en.wikipedia.org/wiki/SOCKS#SOCKS4a [3] SOCKS5: RFC 1928 https://www.ietf.org/rfc/rfc1928.txt [4] RFC 1929: https://www.ietf.org/rfc/rfc1929.txt

Special Hostnames in Tor Nick Mathewson Table of Contents 1. Overview 2. .exit 3. .onion 4. .noconnect

Overview

Most of the time, Tor treats user-specified hostnames as opaque: When the user connects to www.torproject.org, Tor picks an exit node and uses that node to connect to "www.torproject.org". Some hostnames, however, can be used to override Tor's default behavior and circuit-building rules.

These hostnames can be passed to Tor as the address part of a SOCKS4a or SOCKS5 request. If the application is connected to Tor using an IP-only method (such as SOCKS4, TransPort, or NATDPort), these hostnames can be substituted for certain IP addresses using the MapAddress configuration option or the MAPADDRESS control command.

.exit

SYNTAX: [hostname].[name-or-digest].exit [name-or-digest].exit

Hostname is a valid hostname; [name-or-digest] is either the nickname of a Tor node or the hex-encoded digest of that node's public key.

When Tor sees an address in this format, it uses the specified hostname as the exit node. If no "hostname" component is given, Tor defaults to the published IPv4 address of the exit node.

It is valid to try to resolve hostnames, and in fact upon success Tor will cache an internal mapaddress of the form "www.google.com.foo.exit=64.233.161.99.foo.exit" to speed subsequent lookups.

The .exit notation is disabled by default as of Tor 0.2.2.1-alpha, due to potential application-level attacks.

EXAMPLES: www.example.com.exampletornode.exit Connect to www.example.com from the node called "exampletornode". exampletornode.exit Connect to the published IP address of "exampletornode" using "exampletornode" as the exit.

.onion

SYNTAX: [digest].onion [ignored].[digest].onion

Version 2 addresses (deprecated since 0.4.6.1-alpha), the digest is the first eighty bits of a SHA1 hash of the identity key for a hidden service, encoded in base32.

Version 3 addresses, the digest is defined as:

onion_address = base32(PUBKEY | CHECKSUM | VERSION) CHECKSUM = H(".onion checksum" | PUBKEY | VERSION)[:2] where: - PUBKEY is the 32 bytes ed25519 master pubkey of the onion service. - VERSION is a one byte version field (default value '\x03') - ".onion checksum" is a constant string - H is SHA3-256 - CHECKSUM is truncated to two bytes before inserting it in onion_address

When Tor sees an address in this format, it tries to look up and connect to the specified onion service. See rend-spec-v3.txt for full details.

The "ignored" portion of the address is intended for use in vhosting, and is supported in Tor 0.2.4.10-alpha and later.

.noconnect

SYNTAX: [string].noconnect

When Tor sees an address in this format, it immediately closes the connection without attaching it to any circuit. This is useful for controllers that want to test whether a given application is indeed using the same instance of Tor that they're controlling.

This feature was added in Tor 0.1.2.4-alpha, and taken out in Tor 0.2.2.1-alpha over fears that it provided another avenue for detecting Tor users via application-level web tricks.

Tor Rendezvous Specification - Version 3

This document specifies how the hidden service version 3 protocol works. This text used to be proposal 224-rend-spec-ng.txt.

Table of contents:

0. Hidden services: overview and preliminaries. 0.1. Improvements over previous versions. 0.2. Notation and vocabulary 0.3. Cryptographic building blocks 0.4. Protocol building blocks [BUILDING-BLOCKS] 0.5. Assigned relay cell types 0.6. Acknowledgments 1. Protocol overview 1.1. View from 10,000 feet 1.2. In more detail: naming hidden services [NAMING] 1.3. In more detail: Access control [IMD:AC] 1.4. In more detail: Distributing hidden service descriptors. [IMD:DIST] 1.5. In more detail: Scaling to multiple hosts 1.6. In more detail: Backward compatibility with older hidden service 1.7. In more detail: Keeping crypto keys offline 1.8. In more detail: Encryption Keys And Replay Resistance 1.9. In more detail: A menagerie of keys 1.9.1. In even more detail: Client authorization [CLIENT-AUTH] 2. Generating and publishing hidden service descriptors [HSDIR] 2.1. Deriving blinded keys and subcredentials [SUBCRED] 2.2. Locating, uploading, and downloading hidden service descriptors 2.2.1. Dividing time into periods [TIME-PERIODS] 2.2.2. When to publish a hidden service descriptor [WHEN-HSDESC] 2.2.3. Where to publish a hidden service descriptor [WHERE-HSDESC] 2.2.4. Using time periods and SRVs to fetch/upload HS descriptors 2.2.5. Expiring hidden service descriptors [EXPIRE-DESC] 2.2.6. URLs for anonymous uploading and downloading 2.3. Publishing shared random values [PUB-SHAREDRANDOM] 2.3.1. Client behavior in the absence of shared random values 2.3.2. Hidden services and changing shared random values 2.4. Hidden service descriptors: outer wrapper [DESC-OUTER] 2.5. Hidden service descriptors: encryption format [HS-DESC-ENC] 2.5.1. First layer of encryption [HS-DESC-FIRST-LAYER] 2.5.1.1. First layer encryption logic 2.5.1.2. First layer plaintext format 2.5.1.3. Client behavior 2.5.1.4. Obfuscating the number of authorized clients 2.5.2. Second layer of encryption [HS-DESC-SECOND-LAYER] 2.5.2.1. Second layer encryption keys 2.5.2.2. Second layer plaintext format 2.5.3. Deriving hidden service descriptor encryption keys [HS-DESC-ENCRYPTION-KEYS] 3. The introduction protocol [INTRO-PROTOCOL] 3.1. Registering an introduction point [REG_INTRO_POINT] 3.1.1. Extensible ESTABLISH_INTRO protocol. [EST_INTRO] 3.1.1.1. Denial-of-Server Defense Extension. [EST_INTRO_DOS_EXT] 3.1.2. Registering an introduction point on a legacy Tor node [LEGACY_EST_INTRO] 3.1.3. Acknowledging establishment of introduction point [INTRO_ESTABLISHED] 3.2. Sending an INTRODUCE1 cell to the introduction point. [SEND_INTRO1] 3.2.1. INTRODUCE1 cell format [FMT_INTRO1] 3.2.2. INTRODUCE_ACK cell format. [INTRO_ACK] 3.3. Processing an INTRODUCE2 cell at the hidden service. [PROCESS_INTRO2] 3.3.1. Introduction handshake encryption requirements [INTRO-HANDSHAKE-REQS] 3.3.2. Example encryption handshake: ntor with extra data [NTOR-WITH-EXTRA-DATA] 3.4. Authentication during the introduction phase. [INTRO-AUTH] 3.4.1. Ed25519-based authentication. 4. The rendezvous protocol 4.1. Establishing a rendezvous point [EST_REND_POINT] 4.2. Joining to a rendezvous point [JOIN_REND] 4.2.1. Key expansion 4.3. Using legacy hosts as rendezvous points 5. Encrypting data between client and host 6. Encoding onion addresses [ONIONADDRESS] 7. Open Questions: -1. Draft notes

This document describes a proposed design and specification for hidden services in Tor version 0.2.5.x or later. It's a replacement for the current rend-spec.txt, rewritten for clarity and for improved design.

Look for the string "TODO" below: it describes gaps or uncertainties in the design.

Change history:

2013-11-29: Proposal first numbered. Some TODO and XXX items remain.

2014-01-04: Clarify some unclear sections.

2014-01-21: Fix a typo.

2014-02-20: Move more things to the revised certificate format in the new updated proposal 220. 2015-05-26: Fix two typos.

Hidden services: overview and preliminaries.

Hidden services aim to provide responder anonymity for bidirectional stream-based communication on the Tor network. Unlike regular Tor connections, where the connection initiator receives anonymity but the responder does not, hidden services attempt to provide bidirectional anonymity.

Participants:

Operator -- A person running a hidden service

Host, "Server" -- The Tor software run by the operator to provide a hidden service. User -- A person contacting a hidden service. Client -- The Tor software running on the User's computer Hidden Service Directory (HSDir) -- A Tor node that hosts signed statements from hidden service hosts so that users can make contact with them. Introduction Point -- A Tor node that accepts connection requests for hidden services and anonymously relays those requests to the hidden service. Rendezvous Point -- A Tor node to which clients and servers connect and which relays traffic between them.

Improvements over previous versions.

Here is a list of improvements of this proposal over the legacy hidden services:

a) Better crypto (replaced SHA1/DH/RSA1024 with SHA3/ed25519/curve25519) b) Improved directory protocol leaking less to directory servers. c) Improved directory protocol with smaller surface for targeted attacks. d) Better onion address security against impersonation. e) More extensible introduction/rendezvous protocol. f) Offline keys for onion services g) Advanced client authorization

Notation and vocabulary

Unless specified otherwise, all multi-octet integers are big-endian.

We write sequences of bytes in two ways:

1. A sequence of two-digit hexadecimal values in square brackets, as in [AB AD 1D EA]. 2. A string of characters enclosed in quotes, as in "Hello". The characters in these strings are encoded in their ascii representations; strings are NOT nul-terminated unless explicitly described as NUL terminated. We use the words "byte" and "octet" interchangeably. We use the vertical bar | to denote concatenation.

We use INT_N(val) to denote the network (big-endian) encoding of the unsigned integer "val" in N bytes. For example, INT_4(1337) is [00 00 05 39]. Values are truncated like so: val % (2 ^ (N * 8)). For example, INT_4(42) is 42 % 4294967296 (32 bit).

Cryptographic building blocks

This specification uses the following cryptographic building blocks:

* A pseudorandom number generator backed by a strong entropy source. The output of the PRNG should always be hashed before being posted on the network to avoid leaking raw PRNG bytes to the network (see [PRNG-REFS]). * A stream cipher STREAM(iv, k) where iv is a nonce of length S_IV_LEN bytes and k is a key of length S_KEY_LEN bytes. * A public key signature system SIGN_KEYGEN()->seckey, pubkey; SIGN_SIGN(seckey,msg)->sig; and SIGN_CHECK(pubkey, sig, msg) -> { "OK", "BAD" }; where secret keys are of length SIGN_SECKEY_LEN bytes, public keys are of length SIGN_PUBKEY_LEN bytes, and signatures are of length SIGN_SIG_LEN bytes. This signature system must also support key blinding operations as discussed in appendix [KEYBLIND] and in section [SUBCRED]: SIGN_BLIND_SECKEY(seckey, blind)->seckey2 and SIGN_BLIND_PUBKEY(pubkey, blind)->pubkey2 . * A public key agreement system "PK", providing PK_KEYGEN()->seckey, pubkey; PK_VALID(pubkey) -> {"OK", "BAD"}; and PK_HANDSHAKE(seckey, pubkey)->output; where secret keys are of length PK_SECKEY_LEN bytes, public keys are of length PK_PUBKEY_LEN bytes, and the handshake produces outputs of length PK_OUTPUT_LEN bytes. * A cryptographic hash function H(d), which should be preimage and collision resistant. It produces hashes of length HASH_LEN bytes. * A cryptographic message authentication code MAC(key,msg) that produces outputs of length MAC_LEN bytes. * A key derivation function KDF(message, n) that outputs n bytes. As a first pass, I suggest: * Instantiate STREAM with AES256-CTR. * Instantiate SIGN with Ed25519 and the blinding protocol in [KEYBLIND]. * Instantiate PK with Curve25519. * Instantiate H with SHA3-256. * Instantiate KDF with SHAKE-256. * Instantiate MAC(key=k, message=m) with H(k_len | k | m), where k_len is htonll(len(k)).

When we need a particular MAC key length below, we choose MAC_KEY_LEN=32 (256 bits).

For legacy purposes, we specify compatibility with older versions of the Tor introduction point and rendezvous point protocols. These used RSA1024, DH1024, AES128, and SHA1, as discussed in rend-spec.txt.

As in [proposal 220], all signatures are generated not over strings themselves, but over those strings prefixed with a distinguishing value.

Protocol building blocks [BUILDING-BLOCKS]

In sections below, we need to transmit the locations and identities of Tor nodes. We do so in the link identification format used by EXTEND2 cells in the Tor protocol.

NSPEC (Number of link specifiers) [1 byte] NSPEC times: LSTYPE (Link specifier type) [1 byte] LSLEN (Link specifier length) [1 byte] LSPEC (Link specifier) [LSLEN bytes]

Link specifier types are as described in tor-spec.txt. Every set of link specifiers SHOULD include at minimum specifiers of type [00] (TLS-over-TCP, IPv4), [02] (legacy node identity) and [03] (ed25519 identity key). Sets of link specifiers without these three types SHOULD be rejected.

As of 0.4.1.1-alpha, Tor includes both IPv4 and IPv6 link specifiers in v3 onion service protocol link specifier lists. All available addresses SHOULD be included as link specifiers, regardless of the address that Tor actually used to connect/extend to the remote relay.

We also incorporate Tor's circuit extension handshakes, as used in the CREATE2 and CREATED2 cells described in tor-spec.txt. In these handshakes, a client who knows a public key for a server sends a message and receives a message from that server. Once the exchange is done, the two parties have a shared set of forward-secure key material, and the client knows that nobody else shares that key material unless they control the secret key corresponding to the server's public key.

Assigned relay cell types

These relay cell types are reserved for use in the hidden service protocol.

32 -- RELAY_COMMAND_ESTABLISH_INTRO

Sent from hidden service host to introduction point; establishes introduction point. Discussed in [REG_INTRO_POINT]. 33 -- RELAY_COMMAND_ESTABLISH_RENDEZVOUS Sent from client to rendezvous point; creates rendezvous point. Discussed in [EST_REND_POINT]. 34 -- RELAY_COMMAND_INTRODUCE1 Sent from client to introduction point; requests introduction. Discussed in [SEND_INTRO1] 35 -- RELAY_COMMAND_INTRODUCE2 Sent from introduction point to hidden service host; requests introduction. Same format as INTRODUCE1. Discussed in [FMT_INTRO1] and [PROCESS_INTRO2] 36 -- RELAY_COMMAND_RENDEZVOUS1 Sent from hidden service host to rendezvous point; attempts to join host's circuit to client's circuit. Discussed in [JOIN_REND] 37 -- RELAY_COMMAND_RENDEZVOUS2 Sent from rendezvous point to client; reports join of host's circuit to client's circuit. Discussed in [JOIN_REND] 38 -- RELAY_COMMAND_INTRO_ESTABLISHED Sent from introduction point to hidden service host; reports status of attempt to establish introduction point. Discussed in [INTRO_ESTABLISHED] 39 -- RELAY_COMMAND_RENDEZVOUS_ESTABLISHED Sent from rendezvous point to client; acknowledges receipt of ESTABLISH_RENDEZVOUS cell. Discussed in [EST_REND_POINT] 40 -- RELAY_COMMAND_INTRODUCE_ACK Sent from introduction point to client; acknowledges receipt of INTRODUCE1 cell and reports success/failure. Discussed in [INTRO_ACK]

Acknowledgments

This design includes ideas from many people, including

Christopher Baines, Daniel J. Bernstein, Matthew Finkel, Ian Goldberg, George Kadianakis, Aniket Kate, Tanja Lange, Robert Ransom, Roger Dingledine, Aaron Johnson, Tim Wilson-Brown ("teor"), special (John Brooks), s7r

It's based on Tor's original hidden service design by Roger Dingledine, Nick Mathewson, and Paul Syverson, and on improvements to that design over the years by people including

Tobias Kamm, Thomas Lauterbach, Karsten Loesing, Alessandro Preite Martinez, Robert Ransom, Ferdinand Rieger, Christoph Weingarten, Christian Wilms,

We wouldn't be able to do any of this work without good attack designs from researchers including

Alex Biryukov, Lasse Øverlier, Ivan Pustogarov, Paul Syverson, Ralf-Philipp Weinmann, See [ATTACK-REFS] for their papers. Several of these ideas have come from conversations with Christian Grothoff, Brian Warner, Zooko Wilcox-O'Hearn,

And if this document makes any sense at all, it's thanks to editing help from

Matthew Finkel, George Kadianakis, Peter Palfrader, Tim Wilson-Brown ("teor"),

[XXX Acknowledge the huge bunch of people working on 8106.] [XXX Acknowledge the huge bunch of people working on 8244.]

Please forgive me if I've missed you; please forgive me if I've misunderstood your best ideas here too.

Protocol overview

In this section, we outline the hidden service protocol. This section omits some details in the name of simplicity; those are given more fully below, when we specify the protocol in more detail.

View from 10,000 feet

A hidden service host prepares to offer a hidden service by choosing several Tor nodes to serve as its introduction points. It builds circuits to those nodes, and tells them to forward introduction requests to it using those circuits.

Once introduction points have been picked, the host builds a set of documents called "hidden service descriptors" (or just "descriptors" for short) and uploads them to a set of HSDir nodes. These documents list the hidden service's current introduction points and describe how to make contact with the hidden service.

When a client wants to connect to a hidden service, it first chooses a Tor node at random to be its "rendezvous point" and builds a circuit to that rendezvous point. If the client does not have an up-to-date descriptor for the service, it contacts an appropriate HSDir and requests such a descriptor.

The client then builds an anonymous circuit to one of the hidden service's introduction points listed in its descriptor, and gives the introduction point an introduction request to pass to the hidden service. This introduction request includes the target rendezvous point and the first part of a cryptographic handshake.

Upon receiving the introduction request, the hidden service host makes an anonymous circuit to the rendezvous point and completes the cryptographic handshake. The rendezvous point connects the two circuits, and the cryptographic handshake gives the two parties a shared key and proves to the client that it is indeed talking to the hidden service.

Once the two circuits are joined, the client can send Tor RELAY cells to the server. RELAY_BEGIN cells open streams to an external process or processes configured by the server; RELAY_DATA cells are used to communicate data on those streams, and so forth.

In more detail: naming hidden services [NAMING]

A hidden service's name is its long term master identity key. This is encoded as a hostname by encoding the entire key in Base 32, including a version byte and a checksum, and then appending the string ".onion" at the end. The result is a 56-character domain name.

(This is a change from older versions of the hidden service protocol, where we used an 80-bit truncated SHA1 hash of a 1024 bit RSA key.)

The names in this format are distinct from earlier names because of their length. An older name might look like:

unlikelynamefora.onion yyhws9optuwiwsns.onion And a new name following this specification might look like: l5satjgud6gucryazcyvyvhuxhr74u6ygigiuyixe3a6ysis67ororad.onion Please see section [ONIONADDRESS] for the encoding specification.

In more detail: Access control [IMD:AC]

Access control for a hidden service is imposed at multiple points through the process above. Furthermore, there is also the option to impose additional client authorization access control using pre-shared secrets exchanged out-of-band between the hidden service and its clients.

The first stage of access control happens when downloading HS descriptors. Specifically, in order to download a descriptor, clients must know which blinded signing key was used to sign it. (See the next section for more info on key blinding.)

To learn the introduction points, clients must decrypt the body of the hidden service descriptor. To do so, clients must know the unblinded public key of the service, which makes the descriptor unusable by entities without that knowledge (e.g. HSDirs that don't know the onion address).

Also, if optional client authorization is enabled, hidden service descriptors are superencrypted using each authorized user's identity x25519 key, to further ensure that unauthorized entities cannot decrypt it.

In order to make the introduction point send a rendezvous request to the service, the client needs to use the per-introduction-point authentication key found in the hidden service descriptor.

The final level of access control happens at the server itself, which may decide to respond or not respond to the client's request depending on the contents of the request. The protocol is extensible at this point: at a minimum, the server requires that the client demonstrate knowledge of the contents of the encrypted portion of the hidden service descriptor. If optional client authorization is enabled, the service may additionally require the client to prove knowledge of a pre-shared private key.

In more detail: Distributing hidden service descriptors. [IMD:DIST]

Periodically, hidden service descriptors become stored at different locations to prevent a single directory or small set of directories from becoming a good DoS target for removing a hidden service.

For each period, the Tor directory authorities agree upon a collaboratively generated random value. (See section 2.3 for a description of how to incorporate this value into the voting practice; generating the value is described in other proposals, including [SHAREDRANDOM-REFS].) That value, combined with hidden service directories' public identity keys, determines each HSDir's position in the hash ring for descriptors made in that period.

Each hidden service's descriptors are placed into the ring in positions based on the key that was used to sign them. Note that hidden service descriptors are not signed with the services' public keys directly. Instead, we use a key-blinding system [KEYBLIND] to create a new key-of-the-day for each hidden service. Any client that knows the hidden service's public identity key can derive these blinded signing keys for a given period. It should be impossible to derive the blinded signing key lacking that knowledge.

This is achieved using two nonces:

* A "credential", derived from the public identity key KP_hs_id. N_hs_cred. * A "subcredential", derived from the credential N_hs_cred and information which various with the current time period. N_hs_subcred.

The body of each descriptor is also encrypted with a key derived from the public signing key.

To avoid a "thundering herd" problem where every service generates and uploads a new descriptor at the start of each period, each descriptor comes online at a time during the period that depends on its blinded signing key. The keys for the last period remain valid until the new keys come online.

In more detail: Scaling to multiple hosts

This design is compatible with our current approaches for scaling hidden services. Specifically, hidden service operators can use onionbalance to achieve high availability between multiple nodes on the HSDir layer. Furthermore, operators can use proposal 255 to load balance their hidden services on the introduction layer. See [SCALING-REFS] for further discussions on this topic and alternative designs.

1.6. In more detail: Backward compatibility with older hidden service protocols

This design is incompatible with the clients, server, and hsdir node protocols from older versions of the hidden service protocol as described in rend-spec.txt. On the other hand, it is designed to enable the use of older Tor nodes as rendezvous points and introduction points.

In more detail: Keeping crypto keys offline

In this design, a hidden service's secret identity key may be stored offline. It's used only to generate blinded signing keys, which are used to sign descriptor signing keys.

In order to operate a hidden service, the operator can generate in advance a number of blinded signing keys and descriptor signing keys (and their credentials; see [DESC-OUTER] and [HS-DESC-ENC] below), and their corresponding descriptor encryption keys, and export those to the hidden service hosts.

As a result, in the scenario where the Hidden Service gets compromised, the adversary can only impersonate it for a limited period of time (depending on how many signing keys were generated in advance).

It's important to not send the private part of the blinded signing key to the Hidden Service since an attacker can derive from it the secret master identity key. The secret blinded signing key should only be used to create credentials for the descriptor signing keys.

(NOTE: although the protocol allows them, offline keys are not implemented as of 0.3.2.1-alpha.)

In more detail: Encryption Keys And Replay Resistance

To avoid replays of an introduction request by an introduction point, a hidden service host must never accept the same request twice. Earlier versions of the hidden service design used an authenticated timestamp here, but including a view of the current time can create a problematic fingerprint. (See proposal 222 for more discussion.)

In more detail: A menagerie of keys

[In the text below, an "encryption keypair" is roughly "a keypair you can do Diffie-Hellman with" and a "signing keypair" is roughly "a keypair you can do ECDSA with."]

Public/private keypairs defined in this document:

Master (hidden service) identity key -- A master signing keypair used as the identity for a hidden service. This key is long term and not used on its own to sign anything; it is only used to generate blinded signing keys as described in [KEYBLIND] and [SUBCRED]. The public key is encoded in the ".onion" address according to [NAMING]. KP_hs_id, KS_hs_id. Blinded signing key -- A keypair derived from the identity key, used to sign descriptor signing keys. It changes periodically for each service. Clients who know a 'credential' consisting of the service's public identity key and an optional secret can derive the public blinded identity key for a service. This key is used as an index in the DHT-like structure of the directory system (see [SUBCRED]). KP_hs_blind_id, KS_hs_blind_id. Descriptor signing key -- A key used to sign hidden service descriptors. This is signed by blinded signing keys. Unlike blinded signing keys and master identity keys, the secret part of this key must be stored online by hidden service hosts. The public part of this key is included in the unencrypted section of HS descriptors (see [DESC-OUTER]). KP_hs_desc_sign, KS_hs_desc_sign. Introduction point authentication key -- A short-term signing keypair used to identify a hidden service's session at a given introduction point. The service makes a fresh keypair for each introduction point; these are used to sign the request that a hidden service host makes when establishing an introduction point, so that clients who know the public component of this key can get their introduction requests sent to the right service. No keypair is ever used with more than one introduction point. (previously called a "service key" in rend-spec.txt) KP_hs_ipt_sid, KS_hs_ipt_sid ("hidden service introduction point session id"). Introduction point encryption key -- A short-term encryption keypair used when establishing connections via an introduction point. Plays a role analogous to Tor nodes' onion keys. The service makes a fresh keypair for each introduction point. KP_hss_ntor, KS_hss_ntor. Ephemeral descriptor encryption key -- A short-lived encryption keypair made by the service, and used to encrypt the inner layer of hidden service descriptors when client authentication is in use. KP_hss_desc_enc, KS_hss_desc_enc Nonces defined in this document: N_hs_desc_enc -- a nonce used to derive keys to decrypt the inner encryption layer of hidden service descriptors. This is sometimes also called a "descriptor cookie". Public/private keypairs defined elsewhere: Onion key -- Short-term encryption keypair (KS_ntor, KP_ntor). (Node) identity key (KP_relayid). Symmetric key-like things defined elsewhere: KH from circuit handshake -- An unpredictable value derived as part of the Tor circuit extension handshake, used to tie a request to a particular circuit.

In even more detail: Client authorization keys [CLIENT-AUTH]

When client authorization is enabled, each authorized client of a hidden service has two more asymmetric keypairs which are shared with the hidden service. An entity without those keys is not able to use the hidden service. Throughout this document, we assume that these pre-shared keys are exchanged between the hidden service and its clients in a secure out-of-band fashion.

Specifically, each authorized client possesses:

- An x25519 keypair used to compute decryption keys that allow the client to decrypt the hidden service descriptor. See [HS-DESC-ENC]. This is the client's counterpart to KP_hss_desc_enc. KP_hsc_desc_enc, KS_hsd_desc_enc. - An ed25519 keypair which allows the client to compute signatures which prove to the hidden service that the client is authorized. These signatures are inserted into the INTRODUCE1 cell, and without them the introduction to the hidden service cannot be completed. See [INTRO-AUTH]. KP_hsc_intro_auth, KS_hsc_intro_auth.

The right way to exchange these keys is to have the client generate keys and send the corresponding public keys to the hidden service out-of-band. An easier but less secure way of doing this exchange would be to have the hidden service generate the keypairs and pass the corresponding private keys to its clients. See section [CLIENT-AUTH-MGMT] for more details on how these keys should be managed.

[TODO: Also specify stealth client authorization.]

(NOTE: client authorization is implemented as of 0.3.5.1-alpha.)

Generating and publishing hidden service descriptors [HSDIR]

Hidden service descriptors follow the same metaformat as other Tor directory objects. They are published anonymously to Tor servers with the HSDir flag, HSDir=2 protocol version and tor version >= 0.3.0.8 (because a bug was fixed in this version).

Deriving blinded keys and subcredentials [SUBCRED]

In each time period (see [TIME-PERIODS] for a definition of time periods), a hidden service host uses a different blinded private key to sign its directory information, and clients use a different blinded public key as the index for fetching that information.

For a candidate for a key derivation method, see Appendix [KEYBLIND].

Additionally, clients and hosts derive a subcredential for each period. Knowledge of the subcredential is needed to decrypt hidden service descriptors for each period and to authenticate with the hidden service host in the introduction process. Unlike the credential, it changes each period. Knowing the subcredential, even in combination with the blinded private key, does not enable the hidden service host to derive the main credential--therefore, it is safe to put the subcredential on the hidden service host while leaving the hidden service's private key offline.

The subcredential for a period is derived as:

N_hs_subcred = H("subcredential" | N_hs_cred | blinded-public-key).

In the above formula, credential corresponds to:

N_hs_cred = H("credential" | public-identity-key)

where public-identity-key is the public identity master key of the hidden service.

2.2. Locating, uploading, and downloading hidden service descriptors [HASHRING]

To avoid attacks where a hidden service's descriptor is easily targeted for censorship, we store them at different directories over time, and use shared random values to prevent those directories from being predictable far in advance.

Which Tor servers hosts a hidden service depends on:

* the current time period, * the daily subcredential, * the hidden service directories' public keys, * a shared random value that changes in each time period, shared_random_value. * a set of network-wide networkstatus consensus parameters. (Consensus parameters are integer values voted on by authorities and published in the consensus documents, described in dir-spec.txt, section 3.3.) Below we explain in more detail.

Dividing time into periods [TIME-PERIODS]

To prevent a single set of hidden service directory from becoming a target by adversaries looking to permanently censor a hidden service, hidden service descriptors are uploaded to different locations that change over time.

The length of a "time period" is controlled by the consensus parameter 'hsdir-interval', and is a number of minutes between 30 and 14400 (10 days). The default time period length is 1440 (one day).

Time periods start at the Unix epoch (Jan 1, 1970), and are computed by taking the number of minutes since the epoch and dividing by the time period. However, we want our time periods to start at a regular offset from the SRV voting schedule, so we subtract a "rotation time offset" of 12 voting periods from the number of minutes since the epoch, before dividing by the time period (effectively making "our" epoch start at Jan 1, 1970 12:00UTC when the voting period is 1 hour.)

Example: If the current time is 2016-04-13 11:15:01 UTC, making the seconds since the epoch 1460546101, and the number of minutes since the epoch 24342435. We then subtract the "rotation time offset" of 12*60 minutes from the minutes since the epoch, to get 24341715. If the current time period length is 1440 minutes, by doing the division we see that we are currently in time period number 16903.

Specifically, time period #16903 began 16903144060 + (126060) seconds after the epoch, at 2016-04-12 12:00 UTC, and ended at 16904144060 + (126060) seconds after the epoch, at 2016-04-13 12:00 UTC.

When to publish a hidden service descriptor [WHEN-HSDESC]

Hidden services periodically publish their descriptor to the responsible HSDirs. The set of responsible HSDirs is determined as specified in [WHERE-HSDESC].

Specifically, every time a hidden service publishes its descriptor, it also sets up a timer for a random time between 60 minutes and 120 minutes in the future. When the timer triggers, the hidden service needs to publish its descriptor again to the responsible HSDirs for that time period. [TODO: Control republish period using a consensus parameter?]

Overlapping descriptors

Hidden services need to upload multiple descriptors so that they can be reachable to clients with older or newer consensuses than them. Services need to upload their descriptors to the HSDirs before the beginning of each upcoming time period, so that they are readily available for clients to fetch them. Furthermore, services should keep uploading their old descriptor even after the end of a time period, so that they can be reachable by clients that still have consensuses from the previous time period.

Hence, services maintain two active descriptors at every point. Clients on the other hand, don't have a notion of overlapping descriptors, and instead always download the descriptor for the current time period and shared random value. It's the job of the service to ensure that descriptors will be available for all clients. See section [FETCHUPLOADDESC] for how this is achieved.

[TODO: What to do when we run multiple hidden services in a single host?]

Where to publish a hidden service descriptor [WHERE-HSDESC]

This section specifies how the HSDir hash ring is formed at any given time. Whenever a time value is needed (e.g. to get the current time period number), we assume that clients and services use the valid-after time from their latest live consensus.

The following consensus parameters control where a hidden service descriptor is stored;

hsdir_n_replicas = an integer in range [1,16] with default value 2. hsdir_spread_fetch = an integer in range [1,128] with default value 3. hsdir_spread_store = an integer in range [1,128] with default value 4. (Until 0.3.2.8-rc, the default was 3.)

To determine where a given hidden service descriptor will be stored in a given period, after the blinded public key for that period is derived, the uploading or downloading party calculates:

for replicanum in 1...hsdir_n_replicas: hs_service_index(replicanum) = H("store-at-idx" | blinded_public_key | INT_8(replicanum) | INT_8(period_length) | INT_8(period_num) )

where blinded_public_key is specified in section [KEYBLIND], period_length is the length of the time period in minutes, and period_num is calculated using the current consensus "valid-after" as specified in section [TIME-PERIODS].

Then, for each node listed in the current consensus with the HSDir flag, we compute a directory index for that node as:

hs_relay_index(node) = H("node-idx" | node_identity | shared_random_value | INT_8(period_num) | INT_8(period_length) )

where shared_random_value is the shared value generated by the authorities in section [PUB-SHAREDRANDOM], and node_identity is the ed25519 identity key of the node.

Finally, for replicanum in 1...hsdir_n_replicas, the hidden service host uploads descriptors to the first hsdir_spread_store nodes whose indices immediately follow hs_service_index(replicanum). If any of those nodes have already been selected for a lower-numbered replica of the service, any nodes already chosen are disregarded (i.e. skipped over) when choosing a replica's hsdir_spread_store nodes.

When choosing an HSDir to download from, clients choose randomly from among the first hsdir_spread_fetch nodes after the indices. (Note that, in order to make the system better tolerate disappearing HSDirs, hsdir_spread_fetch may be less than hsdir_spread_store.) Again, nodes from lower-numbered replicas are disregarded when choosing the spread for a replica.

Using time periods and SRVs to fetch/upload HS descriptors [FETCHUPLOADDESC]

Hidden services and clients need to make correct use of time periods (TP) and shared random values (SRVs) to successfully fetch and upload descriptors. Furthermore, to avoid problems with skewed clocks, both clients and services use the 'valid-after' time of a live consensus as a way to take decisions with regards to uploading and fetching descriptors. By using the consensus times as the ground truth here, we minimize the desynchronization of clients and services due to system clock. Whenever time-based decisions are taken in this section, assume that they are consensus times and not system times.

As [PUB-SHAREDRANDOM] specifies, consensuses contain two shared random values (the current one and the previous one). Hidden services and clients are asked to match these shared random values with descriptor time periods and use the right SRV when fetching/uploading descriptors. This section attempts to precisely specify how this works.

Let's start with an illustration of the system:

+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $==========|-----------$===========|-----------$===========| | | | | | +------------------------------------------------------------------+ Legend: [TP#1 = Time Period #1] [SRV#1 = Shared Random Value #1] ["$" = descriptor rotation moment]

Client behavior for fetching descriptors [CLIENTFETCH]

And here is how clients use TPs and SRVs to fetch descriptors:

Clients always aim to synchronize their TP with SRV, so they always want to use TP#N with SRV#N: To achieve this wrt time periods, clients always use the current time period when fetching descriptors. Now wrt SRVs, if a client is in the time segment between a new time period and a new SRV (i.e. the segments drawn with "-") it uses the current SRV, else if the client is in a time segment between a new SRV and a new time period (i.e. the segments drawn with "="), it uses the previous SRV.

Example:

+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $==========|-----------$===========|-----------$===========| | | ^ ^ | | C1 C2 | +------------------------------------------------------------------+

If a client (C1) is at 13:00 right after TP#1, then it will use TP#1 and SRV#1 for fetching descriptors. Also, if a client (C2) is at 01:00 right after SRV#2, it will still use TP#1 and SRV#1.

Service behavior for uploading descriptors [SERVICEUPLOAD]

As discussed above, services maintain two active descriptors at any time. We call these the "first" and "second" service descriptors. Services rotate their descriptor every time they receive a consensus with a valid_after time past the next SRV calculation time. They rotate their descriptors by discarding their first descriptor, pushing the second descriptor to the first, and rebuilding their second descriptor with the latest data.

Services like clients also employ a different logic for picking SRV and TP values based on their position in the graph above. Here is the logic:

First descriptor upload logic [FIRSTDESCUPLOAD]

Here is the service logic for uploading its first descriptor:

When a service is in the time segment between a new time period a new SRV (i.e. the segments drawn with "-"), it uses the previous time period and previous SRV for uploading its first descriptor: that's meant to cover for clients that have a consensus that is still in the previous time period.

Example: Consider in the above illustration that the service is at 13:00 right after TP#1. It will upload its first descriptor using TP#0 and SRV#0. So if a client still has a 11:00 consensus it will be able to access it based on the client logic above.

Now if a service is in the time segment between a new SRV and a new time period (i.e. the segments drawn with "=") it uses the current time period and the previous SRV for its first descriptor: that's meant to cover clients with an up-to-date consensus in the same time period as the service.

Example:

+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $==========|-----------$===========|-----------$===========| | | ^ | | S | +------------------------------------------------------------------+

Consider that the service is at 01:00 right after SRV#2: it will upload its first descriptor using TP#1 and SRV#1.

Second descriptor upload logic [SECONDDESCUPLOAD]

Here is the service logic for uploading its second descriptor:

When a service is in the time segment between a new time period a new SRV (i.e. the segments drawn with "-"), it uses the current time period and current SRV for uploading its second descriptor: that's meant to cover for clients that have an up-to-date consensus on the same TP as the service.

Example: Consider in the above illustration that the service is at 13:00 right after TP#1: it will upload its second descriptor using TP#1 and SRV#1.

Now if a service is in the time segment between a new SRV and a new time period (i.e. the segments drawn with "=") it uses the next time period and the current SRV for its second descriptor: that's meant to cover clients with a newer consensus than the service (in the next time period).

Example:

+------------------------------------------------------------------+ | | | 00:00 12:00 00:00 12:00 00:00 12:00 | | SRV#1 TP#1 SRV#2 TP#2 SRV#3 TP#3 | | | | $==========|-----------$===========|-----------$===========| | | ^ | | S | +------------------------------------------------------------------+

Consider that the service is at 01:00 right after SRV#2: it will upload its second descriptor using TP#2 and SRV#2.

Directory behavior for handling descriptor uploads [DIRUPLOAD]

Upon receiving a hidden service descriptor publish request, directories MUST check the following:

* The outer wrapper of the descriptor can be parsed according to [DESC-OUTER] * The version-number of the descriptor is "3" * If the directory has already cached a descriptor for this hidden service, the revision-counter of the uploaded descriptor must be greater than the revision-counter of the cached one * The descriptor signature is valid

If any of these basic validity checks fails, the directory MUST reject the descriptor upload.

NOTE: Even if the descriptor passes the checks above, its first and second layers could still be invalid: directories cannot validate the encrypted layers of the descriptor, as they do not have access to the public key of the service (required for decrypting the first layer of encryption), or the necessary client credentials (for decrypting the second layer).

Expiring hidden service descriptors [EXPIRE-DESC]

Hidden services set their descriptor's "descriptor-lifetime" field to 180 minutes (3 hours). Hidden services ensure that their descriptor will remain valid in the HSDir caches, by republishing their descriptors periodically as specified in [WHEN-HSDESC].

Hidden services MUST also keep their introduction circuits alive for as long as descriptors including those intro points are valid (even if that's after the time period has changed).

URLs for anonymous uploading and downloading

Hidden service descriptors conforming to this specification are uploaded with an HTTP POST request to the URL /tor/hs//publish relative to the hidden service directory's root, and downloaded with an HTTP GET request for the URL /tor/hs// where is a base64 encoding of the hidden service's blinded public key and is the protocol version which is "3" in this case.

These requests must be made anonymously, on circuits not used for anything else.

Client-side validation of onion addresses

When a Tor client receives a prop224 onion address from the user, it MUST first validate the onion address before attempting to connect or fetch its descriptor. If the validation fails, the client MUST refuse to connect.

As part of the address validation, Tor clients should check that the underlying ed25519 key does not have a torsion component. If Tor accepted ed25519 keys with torsion components, attackers could create multiple equivalent onion addresses for a single ed25519 key, which would map to the same service. We want to avoid that because it could lead to phishing attacks and surprising behaviors (e.g. imagine a browser plugin that blocks onion addresses, but could be bypassed using an equivalent onion address with a torsion component).

The right way for clients to detect such fraudulent addresses (which should only occur malevolently and never naturally) is to extract the ed25519 public key from the onion address and multiply it by the ed25519 group order and ensure that the result is the ed25519 identity element. For more details, please see [TORSION-REFS].

Publishing shared random values [PUB-SHAREDRANDOM]

Our design for limiting the predictability of HSDir upload locations relies on a shared random value (SRV) that isn't predictable in advance or too influenceable by an attacker. The authorities must run a protocol to generate such a value at least once per hsdir period. Here we describe how they publish these values; the procedure they use to generate them can change independently of the rest of this specification. For more information see [SHAREDRANDOM-REFS].

According to proposal 250, we add two new lines in consensuses:

"shared-rand-previous-value" SP NUM_REVEALS SP VALUE NL "shared-rand-current-value" SP NUM_REVEALS SP VALUE NL

Client behavior in the absence of shared random values

If the previous or current shared random value cannot be found in a consensus, then Tor clients and services need to generate their own random value for use when choosing HSDirs.

To do so, Tor clients and services use:

SRV = H("shared-random-disaster" | INT_8(period_length) | INT_8(period_num))

where period_length is the length of a time period in minutes, rounded down; period_num is calculated as specified in [TIME-PERIODS] for the wanted shared random value that could not be found originally.

Hidden services and changing shared random values

It's theoretically possible that the consensus shared random values will change or disappear in the middle of a time period because of directory authorities dropping offline or misbehaving.

To avoid client reachability issues in this rare event, hidden services should use the new shared random values to find the new responsible HSDirs and upload their descriptors there.

XXX How long should they upload descriptors there for?

Hidden service descriptors: outer wrapper [DESC-OUTER]

The format for a hidden service descriptor is as follows, using the meta-format from dir-spec.txt.

"hs-descriptor" SP version-number NL

[At start, exactly once.]

The version-number is a 32 bit unsigned integer indicating the version of the descriptor. Current version is "3". "descriptor-lifetime" SP LifetimeMinutes NL [Exactly once] The lifetime of a descriptor in minutes. An HSDir SHOULD expire the hidden service descriptor at least LifetimeMinutes after it was uploaded. The LifetimeMinutes field can take values between 30 and 720 (12 hours). "descriptor-signing-key-cert" NL certificate NL [Exactly once.] The 'certificate' field contains a certificate in the format from proposal 220, wrapped with "-----BEGIN ED25519 CERT-----". The certificate cross-certifies the short-term descriptor signing key with the blinded public key. The certificate type must be [08], and the blinded public key must be present as the signing-key extension. "revision-counter" SP Integer NL [Exactly once.] The revision number of the descriptor. If an HSDir receives a second descriptor for a key that it already has a descriptor for, it should retain and serve the descriptor with the higher revision-counter. (Checking for monotonically increasing revision-counter values prevents an attacker from replacing a newer descriptor signed by a given key with a copy of an older version.) Implementations MUST be able to parse 64-bit values for these counters. "superencrypted" NL encrypted-string [Exactly once.] An encrypted blob, whose format is discussed in [HS-DESC-ENC] below. The blob is base64 encoded and enclosed in -----BEGIN MESSAGE---- and ----END MESSAGE---- wrappers. (The resulting document does not end with a newline character.) "signature" SP signature NL [exactly once, at end.] A signature of all previous fields, using the signing key in the descriptor-signing-key-cert line, prefixed by the string "Tor onion service descriptor sig v3". We use a separate key for signing, so that the hidden service host does not need to have its private blinded key online.

HSDirs accept hidden service descriptors of up to 50k bytes (a consensus parameter should also be introduced to control this value).

Hidden service descriptors: encryption format [HS-DESC-ENC]

Hidden service descriptors are protected by two layers of encryption. Clients need to decrypt both layers to connect to the hidden service.

The first layer of encryption provides confidentiality against entities who don't know the public key of the hidden service (e.g. HSDirs), while the second layer of encryption is only useful when client authorization is enabled and protects against entities that do not possess valid client credentials.

First layer of encryption [HS-DESC-FIRST-LAYER]

The first layer of HS descriptor encryption is designed to protect descriptor confidentiality against entities who don't know the public identity key of the hidden service.

First layer encryption logic

The encryption keys and format for the first layer of encryption are generated as specified in [HS-DESC-ENCRYPTION-KEYS] with customization parameters:

SECRET_DATA = blinded-public-key STRING_CONSTANT = "hsdir-superencrypted-data"

The encryption scheme in [HS-DESC-ENCRYPTION-KEYS] uses the service credential which is derived from the public identity key (see [SUBCRED]) to ensure that only entities who know the public identity key can decrypt the first descriptor layer.

The ciphertext is placed on the "superencrypted" field of the descriptor.

Before encryption the plaintext is padded with NUL bytes to the nearest multiple of 10k bytes.

First layer plaintext format

After clients decrypt the first layer of encryption, they need to parse the plaintext to get to the second layer ciphertext which is contained in the "encrypted" field.

If client auth is enabled, the hidden service generates a fresh descriptor_cookie key (N_hs_desc_enc, 32 random bytes) and encrypts it using each authorized client's identity x25519 key. Authorized clients can use the descriptor cookie (N_hs_desc_enc) to decrypt the second (inner) layer of encryption. Our encryption scheme requires the hidden service to also generate an ephemeral x25519 keypair for each new descriptor.

If client auth is disabled, fake data is placed in each of the fields below to obfuscate whether client authorization is enabled.

Here are all the supported fields:

"desc-auth-type" SP type NL

[Exactly once]

This field contains the type of authorization used to protect the descriptor. The only recognized type is "x25519" and specifies the encryption scheme described in this section. If client authorization is disabled, the value here should be "x25519". "desc-auth-ephemeral-key" SP KP_hs_desc_ephem NL [Exactly once] This field contains `KP_hss_desc_enc`, an ephemeral x25519 public key generated by the hidden service and encoded in base64. The key is used by the encryption scheme below. If client authorization is disabled, the value here should be a fresh x25519 pubkey that will remain unused. "auth-client" SP client-id SP iv SP encrypted-cookie [At least once] When client authorization is enabled, the hidden service inserts an "auth-client" line for each of its authorized clients. If client authorization is disabled, the fields here can be populated with random data of the right size (that's 8 bytes for 'client-id', 16 bytes for 'iv' and 16 bytes for 'encrypted-cookie' all encoded with base64). When client authorization is enabled, each "auth-client" line contains the descriptor cookie `N_hs_desc_enc` encrypted to each individual client. We assume that each authorized client possesses a pre-shared x25519 keypair (`KP_hsc_desc_enc`) which is used to decrypt the descriptor cookie. We now describe the descriptor cookie encryption scheme. Here is what the hidden service computes: SECRET_SEED = x25519(KS_hs_desc_ephem, KP_hsc_desc_enc) KEYS = KDF(N_hs_subcred | SECRET_SEED, 40) CLIENT-ID = fist 8 bytes of KEYS COOKIE-KEY = last 32 bytes of KEYS Here is a description of the fields in the "auth-client" line: - The "client-id" field is CLIENT-ID from above encoded in base64. - The "iv" field is 16 random bytes encoded in base64. - The "encrypted-cookie" field contains the descriptor cookie ciphertext as follows and is encoded in base64: encrypted-cookie = STREAM(iv, COOKIE-KEY) XOR N_hs_desc_enc. See section [FIRST-LAYER-CLIENT-BEHAVIOR] for the client-side logic of how to decrypt the descriptor cookie. "encrypted" NL encrypted-string [Exactly once] An encrypted blob containing the second layer ciphertext, whose format is discussed in [HS-DESC-SECOND-LAYER] below. The blob is base64 encoded and enclosed in -----BEGIN MESSAGE---- and ----END MESSAGE---- wrappers. Compatibility note: The C Tor implementation does not include a final newline when generating this first-layer-plaintext section; other implementations MUST accept this section even if it is missing its final newline. Other implementations MAY generate this section without a final newline themselves, to avoid being distinguishable from C tor.

Client behavior [FIRST-LAYER-CLIENT-BEHAVIOR]

The goal of clients at this stage is to decrypt the "encrypted" field as described in [HS-DESC-SECOND-LAYER]. If client authorization is enabled, authorized clients need to extract the descriptor cookie to proceed with decryption of the second layer as follows: An authorized client parsing the first layer of an encrypted descriptor, extracts the ephemeral key from "desc-auth-ephemeral-key" and calculates CLIENT-ID and COOKIE-KEY as described in the section above using their x25519 private key. The client then uses CLIENT-ID to find the right "auth-client" field which contains the ciphertext of the descriptor cookie. The client then uses COOKIE-KEY and the iv to decrypt the descriptor_cookie, which is used to decrypt the second layer of descriptor encryption as described in [HS-DESC-SECOND-LAYER].

Hiding client authorization data

Hidden services should avoid leaking whether client authorization is enabled or how many authorized clients there are. Hence even when client authorization is disabled, the hidden service adds fake "desc-auth-type", "desc-auth-ephemeral-key" and "auth-client" lines to the descriptor, as described in [HS-DESC-FIRST-LAYER]. The hidden service also avoids leaking the number of authorized clients by adding fake "auth-client" entries to its descriptor. Specifically, descriptors always contain a number of authorized clients that is a multiple of 16 by adding fake "auth-client" entries if needed. [XXX consider randomization of the value 16] Clients MUST accept descriptors with any number of "auth-client" lines as long as the total descriptor size is within the max limit of 50k (also controlled with a consensus parameter).

Second layer of encryption [HS-DESC-SECOND-LAYER]

The second layer of descriptor encryption is designed to protect descriptor confidentiality against unauthorized clients. If client authorization is enabled, it's encrypted using the descriptor_cookie, and contains needed information for connecting to the hidden service, like the list of its introduction points.

If client authorization is disabled, then the second layer of HS encryption does not offer any additional security, but is still used.

Second layer encryption keys

The encryption keys and format for the second layer of encryption are generated as specified in [HS-DESC-ENCRYPTION-KEYS] with customization parameters as follows:

SECRET_DATA = blinded-public-key | descriptor_cookie STRING_CONSTANT = "hsdir-encrypted-data" If client authorization is disabled the 'descriptor_cookie' field is left blank. The ciphertext is placed on the "encrypted" field of the descriptor.

Second layer plaintext format

After decrypting the second layer ciphertext, clients can finally learn the list of intro points etc. The plaintext has the following format:

"create2-formats" SP formats NL

[Exactly once]

A space-separated list of integers denoting CREATE2 cell HTYPEs (handshake types) that the server recognizes. Must include at least ntor as described in tor-spec.txt. See tor-spec section 5.1 for a list of recognized handshake types. "intro-auth-required" SP types NL [At most once] A space-separated list of introduction-layer authentication types; see section [INTRO-AUTH] for more info. A client that does not support at least one of these authentication types will not be able to contact the host. Recognized types are: 'ed25519'. "single-onion-service" [None or at most once] If present, this line indicates that the service is a Single Onion Service (see prop260 for more details about that type of service). This field has been introduced in 0.3.0 meaning 0.2.9 service don't include this. Followed by zero or more introduction points as follows (see section [NUM_INTRO_POINT] below for accepted values): "introduction-point" SP link-specifiers NL [Exactly once per introduction point at start of introduction point section] The link-specifiers is a base64 encoding of a link specifier block in the format described in [BUILDING-BLOCKS] above. As of 0.4.1.1-alpha, services include both IPv4 and IPv6 link specifiers in descriptors. All available addresses SHOULD be included in the descriptor, regardless of the address that the onion service actually used to connect/extend to the intro point. The client SHOULD NOT reject any LSTYPE fields which it doesn't recognize; instead, it should use them verbatim in its EXTEND request to the introduction point. The client SHOULD perform the basic validity checks on the link specifiers in the descriptor, described in `tor-spec.txt` section 5.1.2. These checks SHOULD NOT leak detailed information about the client's version, configuration, or consensus. (See 3.3 for service link specifier handling.) When connecting to the introduction point, the client SHOULD send this list of link specifiers verbatim, in the same order as given here. The client MAY reject the list of link specifiers if it is inconsistent with relay information from the directory, but SHOULD NOT modify it. "onion-key" SP "ntor" SP key NL [Exactly once per introduction point] The key is a base64 encoded curve25519 public key which is the onion key of the introduction point Tor node used for the ntor handshake when a client extends to it. "onion-key" SP KeyType SP key.. NL [Any number of times] Implementations should accept other types of onion keys using this syntax (where "KeyType" is some string other than "ntor"); unrecognized key types should be ignored. "auth-key" NL certificate NL [Exactly once per introduction point] The certificate is a proposal 220 certificate wrapped in "-----BEGIN ED25519 CERT-----". It contains the introduction point authentication key (`KP_hs_ipt_sid`), signed by the descriptor signing key (`KP_hs_desc_sign`). The certificate type must be [09], and the signing key extension is mandatory. NOTE: This certificate was originally intended to be constructed the other way around: the signing and signed keys are meant to be reversed. However, C tor implemented it backwards, and other implementations now need to do the same in order to conform. (Since this section is inside the descriptor, which is _already_ signed by `KP_hs_desc_sign`, the verification aspect of this certificate serves no point in its current form.) "enc-key" SP "ntor" SP key NL [Exactly once per introduction point] The key is a base64 encoded curve25519 public key used to encrypt the introduction request to service. (`KP_hss_ntor`) "enc-key" SP KeyType SP key.. NL [Any number of times] Implementations should accept other types of onion keys using this syntax (where "KeyType" is some string other than "ntor"); unrecognized key types should be ignored. "enc-key-cert" NL certificate NL [Exactly once per introduction point] Cross-certification of the encryption key using the descriptor signing key. For "ntor" keys, certificate is a proposal 220 certificate wrapped in "-----BEGIN ED25519 CERT-----" armor. The subject key is the the ed25519 equivalent of a curve25519 public encryption key (`KP_hss_ntor`), with the ed25519 key derived using the process in proposal 228 appendix A. The signing key is the descriptor signing key (`KP_hs_desc_sign`). The certificate type must be [0B], and the signing-key extension is mandatory. NOTE: As with "auth-key", this certificate was intended to be constructed the other way around. However, for compatibility with C tor, implementations need to construct it this way. It serves even less point than "auth-key", however, since the encryption key `KP_hss_ntor` is already available from the `enc-key` entry. "legacy-key" NL key NL [None or at most once per introduction point] [This field is obsolete and should never be generated; it is included for historical reasons only.] The key is an ASN.1 encoded RSA public key in PEM format used for a legacy introduction point as described in [LEGACY_EST_INTRO]. This field is only present if the introduction point only supports legacy protocol (v2) that is <= 0.2.9 or the protocol version value "HSIntro 3". "legacy-key-cert" NL certificate NL [None or at most once per introduction point] [This field is obsolete and should never be generated; it is included for historical reasons only.] MUST be present if "legacy-key" is present. The certificate is a proposal 220 RSA->Ed cross-certificate wrapped in "-----BEGIN CROSSCERT-----" armor, cross-certifying the RSA public key found in "legacy-key" using the descriptor signing key.

To remain compatible with future revisions to the descriptor format, clients should ignore unrecognized lines in the descriptor. Other encryption and authentication key formats are allowed; clients should ignore ones they do not recognize.

Clients who manage to extract the introduction points of the hidden service can proceed with the introduction protocol as specified in [INTRO-PROTOCOL].

Compatibility note: At least some versions of OnionBalance do not include a final newline when generating this inner plaintext section; other implementations MUST accept this section even if it is missing its final newline.

Deriving hidden service descriptor encryption keys [HS-DESC-ENCRYPTION-KEYS]

In this section we present the generic encryption format for hidden service descriptors. We use the same encryption format in both encryption layers, hence we introduce two customization parameters SECRET_DATA and STRING_CONSTANT which vary between the layers.

The SECRET_DATA parameter specifies the secret data that are used during encryption key generation, while STRING_CONSTANT is merely a string constant that is used as part of the KDF.

Here is the key generation logic:

SALT = 16 bytes from H(random), changes each time we rebuild the descriptor even if the content of the descriptor hasn't changed. (So that we don't leak whether the intro point list etc. changed) secret_input = SECRET_DATA | N_hs_subcred | INT_8(revision_counter) keys = KDF(secret_input | salt | STRING_CONSTANT, S_KEY_LEN + S_IV_LEN + MAC_KEY_LEN) SECRET_KEY = first S_KEY_LEN bytes of keys SECRET_IV = next S_IV_LEN bytes of keys MAC_KEY = last MAC_KEY_LEN bytes of keys The encrypted data has the format: SALT hashed random bytes from above [16 bytes] ENCRYPTED The ciphertext [variable] MAC D_MAC of both above fields [32 bytes] The final encryption format is ENCRYPTED = STREAM(SECRET_IV,SECRET_KEY) XOR Plaintext . Where D_MAC = H(mac_key_len | MAC_KEY | salt_len | SALT | ENCRYPTED) and mac_key_len = htonll(len(MAC_KEY)) and salt_len = htonll(len(SALT)).

Number of introduction points [NUM_INTRO_POINT]

This section defines how many introduction points an hidden service descriptor can have at minimum, by default and the maximum:

Minimum: 0 - Default: 3 - Maximum: 20

A value of 0 would means that the service is still alive but doesn't want to be reached by any client at the moment. Note that the descriptor size increases considerably as more introduction points are added.

The reason for a maximum value of 20 is to give enough scalability to tools like OnionBalance to be able to load balance up to 120 servers (20 x 6 HSDirs) but also in order for the descriptor size to not overwhelmed hidden service directories with user defined values that could be gigantic.

The introduction protocol [INTRO-PROTOCOL]

The introduction protocol proceeds in three steps.

First, a hidden service host builds an anonymous circuit to a Tor node and registers that circuit as an introduction point.

Single Onion Services attempt to build a non-anonymous single-hop circuit, but use an anonymous 3-hop circuit if:

* the intro point is on an address that is configured as unreachable via a direct connection, or * the initial attempt to connect to the intro point over a single-hop circuit fails, and they are retrying the intro point connection. [After 'First' and before 'Second', the hidden service publishes its introduction points and associated keys, and the client fetches them as described in section [HSDIR] above.]

Second, a client builds an anonymous circuit to the introduction point, and sends an introduction request.

Third, the introduction point relays the introduction request along the introduction circuit to the hidden service host, and acknowledges the introduction request to the client.

Registering an introduction point [REG_INTRO_POINT]

Extensible ESTABLISH_INTRO protocol. [EST_INTRO]

When a hidden service is establishing a new introduction point, it sends an ESTABLISH_INTRO cell with the following contents:

AUTH_KEY_TYPE [1 byte] AUTH_KEY_LEN [2 bytes] AUTH_KEY [AUTH_KEY_LEN bytes] N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes] HANDSHAKE_AUTH [MAC_LEN bytes] SIG_LEN [2 bytes] SIG [SIG_LEN bytes]

The AUTH_KEY_TYPE field indicates the type of the introduction point authentication key and the type of the MAC to use in HANDSHAKE_AUTH. Recognized types are:

[00, 01] -- Reserved for legacy introduction cells; see [LEGACY_EST_INTRO below] [02] -- Ed25519; SHA3-256.

The AUTH_KEY_LEN field determines the length of the AUTH_KEY field. The AUTH_KEY field contains the public introduction point authentication key, KP_hs_ipt_sid.

The EXT_FIELD_TYPE, EXT_FIELD_LEN, EXT_FIELD entries are reserved for extensions to the introduction protocol. Extensions with unrecognized EXT_FIELD_TYPE values must be ignored. (EXT_FIELD_LEN may be zero, in which case EXT_FIELD is absent.)

Unless otherwise specified in the documentation for an extension type: * Each extension type SHOULD be sent only once in a message. * Parties MUST ignore any occurrences all occurrences of an extension with a given type after the first such occurrence. * Extensions SHOULD be sent in numerically ascending order by type. (The above extension sorting and multiplicity rules are only defaults; they may be overridden in the descriptions of individual extensions.)

The HANDSHAKE_AUTH field contains the MAC of all earlier fields in the cell using as its key the shared per-circuit material ("KH") generated during the circuit extension protocol; see tor-spec.txt section 5.2, "Setting circuit keys". It prevents replays of ESTABLISH_INTRO cells.

SIG_LEN is the length of the signature.

SIG is a signature, using AUTH_KEY, of all contents of the cell, up to but not including SIG_LEN and SIG. These contents are prefixed with the string "Tor establish-intro cell v1".

Upon receiving an ESTABLISH_INTRO cell, a Tor node first decodes the key and the signature, and checks the signature. The node must reject the ESTABLISH_INTRO cell and destroy the circuit in these cases:

* If the key type is unrecognized * If the key is ill-formatted * If the signature is incorrect * If the HANDSHAKE_AUTH value is incorrect * If the circuit is already a rendezvous circuit. * If the circuit is already an introduction circuit. [TODO: some scalability designs fail there.] * If the key is already in use by another circuit.

Otherwise, the node must associate the key with the circuit, for use later in INTRODUCE1 cells.

Denial-of-Service Defense Extension. [EST_INTRO_DOS_EXT]

This extension can be used to send Denial-of-Service (DoS) parameters to the introduction point in order for it to apply them for the introduction circuit.

If used, it needs to be encoded within the N_EXTENSIONS field of the ESTABLISH_INTRO cell defined in the previous section. The content is defined as follow:

EXT_FIELD_TYPE:

[01] -- Denial-of-Service Parameters.

If this flag is set, the extension should be used by the introduction point to learn what values the denial of service subsystem should be using. EXT_FIELD content format is: N_PARAMS [1 byte] N_PARAMS times: PARAM_TYPE [1 byte] PARAM_VALUE [8 byte] The PARAM_TYPE possible values are: [01] -- DOS_INTRODUCE2_RATE_PER_SEC The rate per second of INTRODUCE2 cell relayed to the service. [02] -- DOS_INTRODUCE2_BURST_PER_SEC The burst per second of INTRODUCE2 cell relayed to the service. The PARAM_VALUE size is 8 bytes in order to accommodate 64bit values. It MUST match the specified limit for the following PARAM_TYPE: [01] -- Min: 0, Max: 2147483647 [02] -- Min: 0, Max: 2147483647 A value of 0 means the defense is disabled. If the rate per second is set to 0 (param 0x01) then the burst value should be ignored. And vice-versa, if the burst value is 0 (param 0x02), then the rate value should be ignored. In other words, setting one single parameter to 0 disables the defense. The burst can NOT be smaller than the rate. If so, the parameters should be ignored by the introduction point. Any valid value does have precedence over the network wide consensus parameter.

Using this extension extends the payload of the ESTABLISH_INTRO cell by 19 bytes bringing it from 134 bytes to 155 bytes.

This extension can only be used with relays supporting the protocol version "HSIntro=5".

Introduced in tor-0.4.2.1-alpha.

3.1.2. Registering an introduction point on a legacy Tor node [LEGACY_EST_INTRO] [This section is obsolete and refers to a workaround for now-obsolete Tor relay versions. It is included for historical reasons.]

Tor nodes should also support an older version of the ESTABLISH_INTRO cell, first documented in rend-spec.txt. New hidden service hosts must use this format when establishing introduction points at older Tor nodes that do not support the format above in [EST_INTRO].

In this older protocol, an ESTABLISH_INTRO cell contains:

KEY_LEN [2 bytes] KEY [KEY_LEN bytes] HANDSHAKE_AUTH [20 bytes] SIG [variable, up to end of relay payload] The KEY_LEN variable determines the length of the KEY field.

The KEY field is the ASN1-encoded legacy RSA public key that was also included in the hidden service descriptor.

The HANDSHAKE_AUTH field contains the SHA1 digest of (KH | "INTRODUCE").

The SIG field contains an RSA signature, using PKCS1 padding, of all earlier fields.

Older versions of Tor always use a 1024-bit RSA key for these introduction authentication keys.

Acknowledging establishment of introduction point [INTRO_ESTABLISHED]

After setting up an introduction circuit, the introduction point reports its status back to the hidden service host with an INTRO_ESTABLISHED cell.

The INTRO_ESTABLISHED cell has the following contents:

N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]

Older versions of Tor send back an empty INTRO_ESTABLISHED cell instead. Services must accept an empty INTRO_ESTABLISHED cell from a legacy relay. [The above paragraph is obsolete and refers to a workaround for now-obsolete Tor relay versions. It is included for historical reasons.]

The same rules for multiplicity, ordering, and handling unknown types apply to the extension fields here as described [EST_INTRO] above.

Sending an INTRODUCE1 cell to the introduction point. [SEND_INTRO1]

In order to participate in the introduction protocol, a client must know the following:

* An introduction point for a service. * The introduction authentication key for that introduction point. * The introduction encryption key for that introduction point.

The client sends an INTRODUCE1 cell to the introduction point, containing an identifier for the service, an identifier for the encryption key that the client intends to use, and an opaque blob to be relayed to the hidden service host.

In reply, the introduction point sends an INTRODUCE_ACK cell back to the client, either informing it that its request has been delivered, or that its request will not succeed.

[TODO: specify what tor should do when receiving a malformed cell. Drop it? Kill circuit? This goes for all possible cells.]

INTRODUCE1 cell format [FMT_INTRO1]

When a client is connecting to an introduction point, INTRODUCE1 cells should be of the form:

LEGACY_KEY_ID [20 bytes] AUTH_KEY_TYPE [1 byte] AUTH_KEY_LEN [2 bytes] AUTH_KEY [AUTH_KEY_LEN bytes] N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes] ENCRYPTED [Up to end of relay payload]

AUTH_KEY_TYPE is defined as in [EST_INTRO]. Currently, the only value of AUTH_KEY_TYPE for this cell is an Ed25519 public key [02].

The LEGACY_KEY_ID field is used to distinguish between legacy and new style INTRODUCE1 cells. In new style INTRODUCE1 cells, LEGACY_KEY_ID is 20 zero bytes. Upon receiving an INTRODUCE1 cell, the introduction point checks the LEGACY_KEY_ID field. If LEGACY_KEY_ID is non-zero, the INTRODUCE1 cell should be handled as a legacy INTRODUCE1 cell by the intro point.

Upon receiving a INTRODUCE1 cell, the introduction point checks whether AUTH_KEY matches the introduction point authentication key for an active introduction circuit. If so, the introduction point sends an INTRODUCE2 cell with exactly the same contents to the service, and sends an INTRODUCE_ACK response to the client.

(Note that the introduction point does not "clean up" the INTRODUCE1 cells that it retransmits. Specifically, it does not change the order or multiplicity of the extensions sent by the client.)

The same rules for multiplicity, ordering, and handling unknown types apply to the extension fields here as described [EST_INTRO] above.

INTRODUCE_ACK cell format. [INTRO_ACK]

An INTRODUCE_ACK cell has the following fields:

STATUS [2 bytes] N_EXTENSIONS [1 bytes] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes] Recognized status values are: [00 00] -- Success: cell relayed to hidden service host. [00 01] -- Failure: service ID not recognized [00 02] -- Bad message format [00 03] -- Can't relay cell to service

The same rules for multiplicity, ordering, and handling unknown types apply to the extension fields here as described [EST_INTRO] above.

Processing an INTRODUCE2 cell at the hidden service. [PROCESS_INTRO2]

Upon receiving an INTRODUCE2 cell, the hidden service host checks whether the AUTH_KEY or LEGACY_KEY_ID field matches the keys for this introduction circuit.

The service host then checks whether it has received a cell with these contents or rendezvous cookie before. If it has, it silently drops it as a replay. (It must maintain a replay cache for as long as it accepts cells with the same encryption key. Note that the encryption format below should be non-malleable.)

If the cell is not a replay, it decrypts the ENCRYPTED field, establishes a shared key with the client, and authenticates the whole contents of the cell as having been unmodified since they left the client. There may be multiple ways of decrypting the ENCRYPTED field, depending on the chosen type of the encryption key. Requirements for an introduction handshake protocol are described in [INTRO-HANDSHAKE-REQS]. We specify one below in section [NTOR-WITH-EXTRA-DATA].

The decrypted plaintext must have the form:

RENDEZVOUS_COOKIE [20 bytes] N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes] ONION_KEY_TYPE [1 bytes] ONION_KEY_LEN [2 bytes] ONION_KEY [ONION_KEY_LEN bytes] NSPEC (Number of link specifiers) [1 byte] NSPEC times: LSTYPE (Link specifier type) [1 byte] LSLEN (Link specifier length) [1 byte] LSPEC (Link specifier) [LSLEN bytes] PAD (optional padding) [up to end of plaintext]

Upon processing this plaintext, the hidden service makes sure that any required authentication is present in the extension fields, and then extends a rendezvous circuit to the node described in the LSPEC fields, using the ONION_KEY to complete the extension. As mentioned in [BUILDING-BLOCKS], the "TLS-over-TCP, IPv4" and "Legacy node identity" specifiers must be present.

As of 0.4.1.1-alpha, clients include both IPv4 and IPv6 link specifiers in INTRODUCE1 cells. All available addresses SHOULD be included in the cell, regardless of the address that the client actually used to extend to the rendezvous point.

The hidden service should handle invalid or unrecognised link specifiers the same way as clients do in section 2.5.2.2. In particular, services SHOULD perform basic validity checks on link specifiers, and SHOULD NOT reject unrecognised link specifiers, to avoid information leaks. The list of link specifiers received here SHOULD either be rejected, or sent verbatim when extending to the rendezvous point, in the same order received.

The service MAY reject the list of link specifiers if it is inconsistent with relay information from the directory, but SHOULD NOT modify it.

The ONION_KEY_TYPE field is:

[01] NTOR: ONION_KEY is 32 bytes long.

The ONION_KEY field describes the onion key that must be used when extending to the rendezvous point. It must be of a type listed as supported in the hidden service descriptor.

When using a legacy introduction point, the INTRODUCE cells must be padded to a certain length using the PAD field in the encrypted portion.

Upon receiving a well-formed INTRODUCE2 cell, the hidden service host will have:

* The information needed to connect to the client's chosen rendezvous point. * The second half of a handshake to authenticate and establish a shared key with the hidden service client. * A set of shared keys to use for end-to-end encryption.

The same rules for multiplicity, ordering, and handling unknown types apply to the extension fields here as described [EST_INTRO] above.

Introduction handshake encryption requirements [INTRO-HANDSHAKE-REQS]

When decoding the encrypted information in an INTRODUCE2 cell, a hidden service host must be able to:

* Decrypt additional information included in the INTRODUCE2 cell, to include the rendezvous token and the information needed to extend to the rendezvous point. * Establish a set of shared keys for use with the client. * Authenticate that the cell has not been modified since the client generated it.

Note that the old TAP-derived protocol of the previous hidden service design achieved the first two requirements, but not the third.

3.3.2. Example encryption handshake: ntor with extra data [NTOR-WITH-EXTRA-DATA] [TODO: relocate this]

This is a variant of the ntor handshake (see tor-spec.txt, section 5.1.4; see proposal 216; and see "Anonymity and one-way authentication in key-exchange protocols" by Goldberg, Stebila, and Ustaoglu).

It behaves the same as the ntor handshake, except that, in addition to negotiating forward secure keys, it also provides a means for encrypting non-forward-secure data to the server (in this case, to the hidden service host) as part of the handshake.

Notation here is as in section 5.1.4 of tor-spec.txt, which defines the ntor handshake.

The PROTOID for this variant is "tor-hs-ntor-curve25519-sha3-256-1". We also use the following tweak values:

t_hsenc = PROTOID | ":hs_key_extract" t_hsverify = PROTOID | ":hs_verify" t_hsmac = PROTOID | ":hs_mac" m_hsexpand = PROTOID | ":hs_key_expand"

To make an INTRODUCE1 cell, the client must know a public encryption key B for the hidden service on this introduction circuit. The client generates a single-use keypair:

x,X = KEYGEN()

and computes:

intro_secret_hs_input = EXP(B,x) | AUTH_KEY | X | B | PROTOID info = m_hsexpand | N_hs_subcred hs_keys = KDF(intro_secret_hs_input | t_hsenc | info, S_KEY_LEN+MAC_LEN) ENC_KEY = hs_keys[0:S_KEY_LEN] MAC_KEY = hs_keys[S_KEY_LEN:S_KEY_LEN+MAC_KEY_LEN] and sends, as the ENCRYPTED part of the INTRODUCE1 cell: CLIENT_PK [PK_PUBKEY_LEN bytes] ENCRYPTED_DATA [Padded to length of plaintext] MAC [MAC_LEN bytes]

Substituting those fields into the INTRODUCE1 cell body format described in [FMT_INTRO1] above, we have

LEGACY_KEY_ID [20 bytes] AUTH_KEY_TYPE [1 byte] AUTH_KEY_LEN [2 bytes] AUTH_KEY [AUTH_KEY_LEN bytes] N_EXTENSIONS [1 bytes] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes] ENCRYPTED: CLIENT_PK [PK_PUBKEY_LEN bytes] ENCRYPTED_DATA [Padded to length of plaintext] MAC [MAC_LEN bytes]

(This format is as documented in [FMT_INTRO1] above, except that here we describe how to build the ENCRYPTED portion.)

Here, the encryption key plays the role of B in the regular ntor handshake, and the AUTH_KEY field plays the role of the node ID. The CLIENT_PK field is the public key X. The ENCRYPTED_DATA field is the message plaintext, encrypted with the symmetric key ENC_KEY. The MAC field is a MAC of all of the cell from the AUTH_KEY through the end of ENCRYPTED_DATA, using the MAC_KEY value as its key.

To process this format, the hidden service checks PK_VALID(CLIENT_PK) as necessary, and then computes ENC_KEY and MAC_KEY as the client did above, except using EXP(CLIENT_PK,b) in the calculation of intro_secret_hs_input. The service host then checks whether the MAC is correct. If it is invalid, it drops the cell. Otherwise, it computes the plaintext by decrypting ENCRYPTED_DATA.

The hidden service host now completes the service side of the extended ntor handshake, as described in tor-spec.txt section 5.1.4, with the modified PROTOID as given above. To be explicit, the hidden service host generates a keypair of y,Y = KEYGEN(), and uses its introduction point encryption key 'b' to compute:

intro_secret_hs_input = EXP(X,b) | AUTH_KEY | X | B | PROTOID info = m_hsexpand | N_hs_subcred hs_keys = KDF(intro_secret_hs_input | t_hsenc | info, S_KEY_LEN+MAC_LEN) HS_DEC_KEY = hs_keys[0:S_KEY_LEN] HS_MAC_KEY = hs_keys[S_KEY_LEN:S_KEY_LEN+MAC_KEY_LEN] (The above are used to check the MAC and then decrypt the encrypted data.) rend_secret_hs_input = EXP(X,y) | EXP(X,b) | AUTH_KEY | B | X | Y | PROTOID NTOR_KEY_SEED = MAC(rend_secret_hs_input, t_hsenc) verify = MAC(rend_secret_hs_input, t_hsverify) auth_input = verify | AUTH_KEY | B | Y | X | PROTOID | "Server" AUTH_INPUT_MAC = MAC(auth_input, t_hsmac) (The above are used to finish the ntor handshake.) The server's handshake reply is: SERVER_PK Y [PK_PUBKEY_LEN bytes] AUTH AUTH_INPUT_MAC [MAC_LEN bytes]

These fields will be sent to the client in a RENDEZVOUS1 cell using the HANDSHAKE_INFO element (see [JOIN_REND]).

The hidden service host now also knows the keys generated by the handshake, which it will use to encrypt and authenticate data end-to-end between the client and the server. These keys are as computed in tor-spec.txt section 5.1.4, except that instead of using AES-128 and SHA1 for this hop, we use AES-256 and SHA3-256.

Authentication during the introduction phase. [INTRO-AUTH]

Hidden services may restrict access only to authorized users. One mechanism to do so is the credential mechanism, where only users who know the credential for a hidden service may connect at all.

There is one defined authentication type: ed25519.

Ed25519-based authentication ed25519.

To authenticate with an Ed25519 private key, the user must include an extension field in the encrypted part of the INTRODUCE1 cell with an EXT_FIELD_TYPE type of [02] and the contents:

Nonce [16 bytes] Pubkey [32 bytes] Signature [64 bytes]

Nonce is a random value. Pubkey is the public key that will be used to authenticate. [TODO: should this be an identifier for the public key instead?] Signature is the signature, using Ed25519, of:

"hidserv-userauth-ed25519" Nonce (same as above) Pubkey (same as above) AUTH_KEY (As in the INTRODUCE1 cell)

The hidden service host checks this by seeing whether it recognizes and would accept a signature from the provided public key. If it would, then it checks whether the signature is correct. If it is, then the correct user has authenticated.

Replay prevention on the whole cell is sufficient to prevent replays on the authentication.

Users SHOULD NOT use the same public key with multiple hidden services.

The rendezvous protocol

Before connecting to a hidden service, the client first builds a circuit to an arbitrarily chosen Tor node (known as the rendezvous point), and sends an ESTABLISH_RENDEZVOUS cell. The hidden service later connects to the same node and sends a RENDEZVOUS cell. Once this has occurred, the relay forwards the contents of the RENDEZVOUS cell to the client, and joins the two circuits together.

Single Onion Services attempt to build a non-anonymous single-hop circuit, but use an anonymous 3-hop circuit if:

* the rend point is on an address that is configured as unreachable via a direct connection, or * the initial attempt to connect to the rend point over a single-hop circuit fails, and they are retrying the rend point connection.

Establishing a rendezvous point [EST_REND_POINT]

The client sends the rendezvous point a RELAY_COMMAND_ESTABLISH_RENDEZVOUS cell containing a 20-byte value.

RENDEZVOUS_COOKIE [20 bytes]

Rendezvous points MUST ignore any extra bytes in an ESTABLISH_RENDEZVOUS cell. (Older versions of Tor did not.)

The rendezvous cookie is an arbitrary 20-byte value, chosen randomly by the client. The client SHOULD choose a new rendezvous cookie for each new connection attempt. If the rendezvous cookie is already in use on an existing circuit, the rendezvous point should reject it and destroy the circuit.

Upon receiving an ESTABLISH_RENDEZVOUS cell, the rendezvous point associates the cookie with the circuit on which it was sent. It replies to the client with an empty RENDEZVOUS_ESTABLISHED cell to indicate success. Clients MUST ignore any extra bytes in a RENDEZVOUS_ESTABLISHED cell.

The client MUST NOT use the circuit which sent the cell for any purpose other than rendezvous with the given location-hidden service.

The client should establish a rendezvous point BEFORE trying to connect to a hidden service.

Joining to a rendezvous point [JOIN_REND]

To complete a rendezvous, the hidden service host builds a circuit to the rendezvous point and sends a RENDEZVOUS1 cell containing:

RENDEZVOUS_COOKIE [20 bytes] HANDSHAKE_INFO [variable; depends on handshake type used.]

where RENDEZVOUS_COOKIE is the cookie suggested by the client during the introduction (see [PROCESS_INTRO2]) and HANDSHAKE_INFO is defined in [NTOR-WITH-EXTRA-DATA].

If the cookie matches the rendezvous cookie set on any not-yet-connected circuit on the rendezvous point, the rendezvous point connects the two circuits, and sends a RENDEZVOUS2 cell to the client containing the HANDSHAKE_INFO field of the RENDEZVOUS1 cell.

Upon receiving the RENDEZVOUS2 cell, the client verifies that HANDSHAKE_INFO correctly completes a handshake. To do so, the client parses SERVER_PK from HANDSHAKE_INFO and reverses the final operations of section [NTOR-WITH-EXTRA-DATA] as shown here:

rend_secret_hs_input = EXP(Y,x) | EXP(B,x) | AUTH_KEY | B | X | Y | PROTOID NTOR_KEY_SEED = MAC(ntor_secret_input, t_hsenc) verify = MAC(ntor_secret_input, t_hsverify) auth_input = verify | AUTH_KEY | B | Y | X | PROTOID | "Server" AUTH_INPUT_MAC = MAC(auth_input, t_hsmac)

Finally the client verifies that the received AUTH field of HANDSHAKE_INFO is equal to the computed AUTH_INPUT_MAC.

Now both parties use the handshake output to derive shared keys for use on the circuit as specified in the section below:

Key expansion

The hidden service and its client need to derive crypto keys from the NTOR_KEY_SEED part of the handshake output. To do so, they use the KDF construction as follows:

K = KDF(NTOR_KEY_SEED | m_hsexpand, HASH_LEN * 2 + S_KEY_LEN * 2)

The first HASH_LEN bytes of K form the forward digest Df; the next HASH_LEN bytes form the backward digest Db; the next S_KEY_LEN bytes form Kf, and the final S_KEY_LEN bytes form Kb. Excess bytes from K are discarded.

Subsequently, the rendezvous point passes relay cells, unchanged, from each of the two circuits to the other. When Alice's OP sends RELAY cells along the circuit, it authenticates with Df, and encrypts them with the Kf, then with all of the keys for the ORs in Alice's side of the circuit; and when Alice's OP receives RELAY cells from the circuit, it decrypts them with the keys for the ORs in Alice's side of the circuit, then decrypts them with Kb, and checks integrity with Db. Bob's OP does the same, with Kf and Kb interchanged.

[TODO: Should we encrypt HANDSHAKE_INFO as we did INTRODUCE2 contents? It's not necessary, but it could be wise. Similarly, we should make it extensible.]

Using legacy hosts as rendezvous points

[This section is obsolete and refers to a workaround for now-obsolete Tor relay versions. It is included for historical reasons.]

The behavior of ESTABLISH_RENDEZVOUS is unchanged from older versions of this protocol, except that relays should now ignore unexpected bytes at the end.

Old versions of Tor required that RENDEZVOUS cell payloads be exactly 168 bytes long. All shorter rendezvous payloads should be padded to this length with random bytes, to make them difficult to distinguish from older protocols at the rendezvous point.

Relays older than 0.2.9.1 should not be used for rendezvous points by next generation onion services because they enforce too-strict length checks to rendezvous cells. Hence the "HSRend" protocol from proposal#264 should be used to select relays for rendezvous points.

Encrypting data between client and host

A successfully completed handshake, as embedded in the INTRODUCE/RENDEZVOUS cells, gives the client and hidden service host a shared set of keys Kf, Kb, Df, Db, which they use for sending end-to-end traffic encryption and authentication as in the regular Tor relay encryption protocol, applying encryption with these keys before other encryption, and decrypting with these keys before other decryption. The client encrypts with Kf and decrypts with Kb; the service host does the opposite.

Encoding onion addresses [ONIONADDRESS]

The onion address of a hidden service includes its identity public key, a version field and a basic checksum. All this information is then base32 encoded as shown below:

onion_address = base32(PUBKEY | CHECKSUM | VERSION) + ".onion" CHECKSUM = H(".onion checksum" | PUBKEY | VERSION)[:2] where: - PUBKEY is the 32 bytes ed25519 master pubkey of the hidden service. - VERSION is a one byte version field (default value '\x03') - ".onion checksum" is a constant string - CHECKSUM is truncated to two bytes before inserting it in onion_address Here are a few example addresses: pg6mmjiyjmcrsslvykfwnntlaru7p5svn6y2ymmju6nubxndf4pscryd.onion sp3k262uwy4r2k3ycr5awluarykdpag6a7y33jxop4cs2lu5uz5sseqd.onion xa4r2iadxm55fbnqgwwi5mymqdcofiu3w6rpbtqn7b2dyn7mgwj64jyd.onion

For more information about this encoding, please see our discussion thread at [ONIONADDRESS-REFS].

Open Questions:

Scaling hidden services is hard. There are on-going discussions that you might be able to help with. See [SCALING-REFS].

How can we improve the HSDir unpredictability design proposed in [SHAREDRANDOM]? See [SHAREDRANDOM-REFS] for discussion.

How can hidden service addresses become memorable while retaining their self-authenticating and decentralized nature? See [HUMANE-HSADDRESSES-REFS] for some proposals; many more are possible.

Hidden Services are pretty slow. Both because of the lengthy setup procedure and because the final circuit has 6 hops. How can we make the Hidden Service protocol faster? See [PERFORMANCE-REFS] for some suggestions.

References:

[KEYBLIND-REFS]: https://trac.torproject.org/projects/tor/ticket/8106 https://lists.torproject.org/pipermail/tor-dev/2012-September/004026.html [KEYBLIND-PROOF]: https://lists.torproject.org/pipermail/tor-dev/2013-December/005943.html [SHAREDRANDOM-REFS]: https://gitweb.torproject.org/torspec.git/tree/proposals/250-commit-reveal-consensus.txt https://trac.torproject.org/projects/tor/ticket/8244 [SCALING-REFS]: https://lists.torproject.org/pipermail/tor-dev/2013-October/005556.html [HUMANE-HSADDRESSES-REFS]: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-onion-nyms.txt http://archives.seul.org/or/dev/Dec-2011/msg00034.html [PERFORMANCE-REFS]: "Improving Efficiency and Simplicity of Tor circuit establishment and hidden services" by Overlier, L., and P. Syverson [TODO: Need more here! Do we have any? :( ] [ATTACK-REFS]: "Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization" by Alex Biryukov, Ivan Pustogarov, Ralf-Philipp Weinmann "Locating Hidden Servers" by Lasse Øverlier and Paul Syverson [ED25519-REFS]: "High-speed high-security signatures" by Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. http://cr.yp.to/papers.html#ed25519 [ED25519-B-REF]: https://tools.ietf.org/html/draft-josefsson-eddsa-ed25519-03#section-5: [PRNG-REFS]: http://projectbullrun.org/dual-ec/ext-rand.html https://lists.torproject.org/pipermail/tor-dev/2015-November/009954.html [SRV-TP-REFS]: https://lists.torproject.org/pipermail/tor-dev/2016-April/010759.html [VANITY-REFS]: https://github.com/Yawning/horse25519 [ONIONADDRESS-REFS]: https://lists.torproject.org/pipermail/tor-dev/2017-January/011816.html [TORSION-REFS]: https://lists.torproject.org/pipermail/tor-dev/2017-April/012164.html https://getmonero.org/2017/05/17/disclosure-of-a-major-bug-in-cryptonote-based-currencies.html

Appendix A: Signature scheme with key blinding [KEYBLIND]

Key derivation overview

As described in [IMD:DIST] and [SUBCRED] above, we require a "key blinding" system that works (roughly) as follows:

There is a master keypair (sk, pk).

Given the keypair and a nonce n, there is a derivation function that gives a new blinded keypair (sk_n, pk_n). This keypair can be used for signing. Given only the public key and the nonce, there is a function that gives pk_n. Without knowing pk, it is not possible to derive pk_n; without knowing sk, it is not possible to derive sk_n. It's possible to check that a signature was made with sk_n while knowing only pk_n. Someone who sees a large number of blinded public keys and signatures made using those public keys can't tell which signatures and which blinded keys were derived from the same master keypair. You can't forge signatures. [TODO: Insert a more rigorous definition and better references.]

Tor's key derivation scheme

We propose the following scheme for key blinding, based on Ed25519.

(This is an ECC group, so remember that scalar multiplication is the trapdoor function, and it's defined in terms of iterated point addition. See the Ed25519 paper [Reference ED25519-REFS] for a fairly clear writeup.)

Let B be the ed25519 basepoint as found in section 5 of [ED25519-B-REF]:

B = (15112221349535400772501151409588531511454012693041857206046113283949847762202, 46316835694926478169428394003475163141307993866256225615783033603165251855960)

Assume B has prime order l, so lB=0. Let a master keypair be written as (a,A), where a is the private key and A is the public key (A=aB).

To derive the key for a nonce N and an optional secret s, compute the blinding factor like this:

h = H(BLIND_STRING | A | s | B | N) BLIND_STRING = "Derive temporary signing key" | INT_1(0) N = "key-blind" | INT_8(period-number) | INT_8(period_length) B = "(1511[...]2202, 4631[...]5960)" then clamp the blinding factor 'h' according to the ed25519 spec: h[0] &= 248; h[31] &= 63; h[31] |= 64; and do the key derivation as follows: private key for the period: a' = h a mod l RH' = SHA-512(RH_BLIND_STRING | RH)[:32] RH_BLIND_STRING = "Derive temporary signing key hash input" public key for the period: A' = h A = (ha)B

Generating a signature of M: given a deterministic random-looking r (see EdDSA paper), take R=rB, S=r+hash(R,A',M)ah mod l. Send signature (R,S) and public key A'.

Verifying the signature: Check whether SB = R+hash(R,A',M)A'.

(If the signature is valid, SB = (r + hash(R,A',M)ah)B = rB + (hash(R,A',M)ah)B = R + hash(R,A',M)A' ) This boils down to regular Ed25519 with key pair (a', A').

See [KEYBLIND-REFS] for an extensive discussion on this scheme and possible alternatives. Also, see [KEYBLIND-PROOF] for a security proof of this scheme.

Appendix B: Selecting nodes [PICKNODES]

Picking introduction points Picking rendezvous points Building paths Reusing circuits

(TODO: This needs a writeup)

Appendix C: Recommendations for searching for vanity .onions [VANITY]

EDITORIAL NOTE: The author thinks that it's silly to brute-force the keyspace for a key that, when base-32 encoded, spells out the name of your website. It also feels a bit dangerous to me. If you train your users to connect to

llamanymityx4fi3l6x2gyzmtmgxjyqyorj9qsb5r543izcwymle.onion

I worry that you're making it easier for somebody to trick them into connecting to

llamanymityb4sqi0ta0tsw6uovyhwlezkcrmczeuzdvfauuemle.onion

Nevertheless, people are probably going to try to do this, so here's a decent algorithm to use.

To search for a public key with some criterion X:

Generate a random (sk,pk) pair.

While pk does not satisfy X:

Add the number 8 to sk Add the point 8*B to pk Return sk, pk.

We add 8 and 8*B, rather than 1 and B, so that sk is always a valid Curve25519 private key, with the lowest 3 bits equal to 0.

This algorithm is safe [source: djb, personal communication] [TODO: Make sure I understood correctly!] so long as only the final (sk,pk) pair is used, and all previous values are discarded.

To parallelize this algorithm, start with an independent (sk,pk) pair generated for each independent thread, and let each search proceed independently.

See [VANITY-REFS] for a reference implementation of this vanity .onion search scheme.

Appendix D: Numeric values reserved in this document

[TODO: collect all the lists of commands and values mentioned above]

Appendix E: Reserved numbers

We reserve these certificate type values for Ed25519 certificates:

[08] short-term descriptor signing key, signed with blinded public key. (Section 2.4) [09] intro point authentication key, cross-certifying the descriptor signing key. (Section 2.5) [0B] ed25519 key derived from the curve25519 intro point encryption key, cross-certifying the descriptor signing key. (Section 2.5) Note: The value "0A" is skipped because it's reserved for the onion key cross-certifying ntor identity key from proposal 228.

Appendix F: Hidden service directory format [HIDSERVDIR-FORMAT]

This appendix section specifies the contents of the HiddenServiceDir directory:

"hostname" [FILE]

This file contains the onion address of the onion service.

"private_key_ed25519" [FILE]

This file contains the private master ed25519 key of the onion service. [TODO: Offline keys]

- "./authorized_clients/" [DIRECTORY] "./authorized_clients/alice.auth" [FILE] "./authorized_clients/bob.auth" [FILE] "./authorized_clients/charlie.auth" [FILE]

If client authorization is enabled, this directory MUST contain a ".auth" file for each authorized client. Each such file contains the public key of the respective client. The files are transmitted to the service operator by the client.

See section [CLIENT-AUTH-MGMT] for more details and the format of the client file.

(NOTE: client authorization is implemented as of 0.3.5.1-alpha.)

Appendix G: Managing authorized client data [CLIENT-AUTH-MGMT]

Hidden services and clients can configure their authorized client data either using the torrc, or using the control port. This section presents a suggested scheme for configuring client authorization. Please see appendix [HIDSERVDIR-FORMAT] for more information about relevant hidden service files.

(NOTE: client authorization is implemented as of 0.3.5.1-alpha.)

G.1. Configuring client authorization using torrc

G.1.1. Hidden Service side configuration

A hidden service that wants to enable client authorization, needs to populate the "authorized_clients/" directory of its HiddenServiceDir directory with the ".auth" files of its authorized clients. When Tor starts up with a configured onion service, Tor checks its <HiddenServiceDir>/authorized_clients/ directory for ".auth" files, and if any recognized and parseable such files are found, then client authorization becomes activated for that service. G.1.2. Service-side bookkeeping This section contains more details on how onion services should be keeping track of their client ".auth" files. For the "descriptor" authentication type, the ".auth" file MUST contain the x25519 public key of that client. Here is a suggested file format: <auth-type>:<key-type>:<base32-encoded-public-key> Here is an an example: descriptor:x25519:OM7TGIVRYMY6PFX6GAC6ATRTA5U6WW6U7A4ZNHQDI6OVL52XVV2Q Tor SHOULD ignore lines it does not recognize. Tor SHOULD ignore files that don't use the ".auth" suffix. G.1.3. Client side configuration A client who wants to register client authorization data for onion services needs to add the following line to their torrc to indicate the directory which hosts ".auth_private" files containing client-side credentials for onion services: ClientOnionAuthDir <DIR> The <DIR> contains a file with the suffix ".auth_private" for each onion service the client is authorized with. Tor should scan the directory for ".auth_private" files to find which onion services require client authorization from this client. For the "descriptor" auth-type, a ".auth_private" file contains the private x25519 key: <onion-address>:descriptor:x25519:<base32-encoded-privkey> The keypair used for client authorization is created by a third party tool for which the public key needs to be transferred to the service operator in a secure out-of-band way. The third party tool SHOULD add appropriate headers to the private key file to ensure that users won't accidentally give out their private key. G.2. Configuring client authorization using the control port G.2.1. Service side A hidden service also has the option to configure authorized clients using the control port. The idea is that hidden service operators can use controller utilities that manage their access control instead of using the filesystem to register client keys. Specifically, we require a new control port command ADD_ONION_CLIENT_AUTH which is able to register x25519/ed25519 public keys tied to a specific authorized client. [XXX figure out control port command format] Hidden services who use the control port interface for client auth need to perform their own key management. G.2.2. Client side There should also be a control port interface for clients to register authorization data for hidden services without having to use the torrc. It should allow both generation of client authorization private keys, and also to import client authorization data provided by a hidden service This way, Tor Browser can present "Generate client auth keys" and "Import client auth keys" dialogs to users when they try to visit a hidden service that is protected by client authorization. Specifically, we require two new control port commands: IMPORT_ONION_CLIENT_AUTH_DATA GENERATE_ONION_CLIENT_AUTH_DATA which import and generate client authorization data respectively. [XXX how does key management work here?] [XXX what happens when people use both the control port interface and the filesystem interface?]

Appendix F: Two methods for managing revision counters.

Implementations MAY generate revision counters in any way they please, so long as they are monotonically increasing over the lifetime of each blinded public key. But to avoid fingerprinting, implementors SHOULD choose a strategy also used by other Tor implementations. Here we describe two, and additionally list some strategies that implementors should NOT use.

F.1. Increment-on-generation

This is the simplest strategy, and the one used by Tor through at least version 0.3.4.0-alpha. Whenever using a new blinded key, the service records the highest revision counter it has used with that key. When generating a descriptor, the service uses the smallest non-negative number higher than any number it has already used. In other words, the revision counters under this system start fresh with each blinded key as 0, 1, 2, 3, and so on. F.2. Encrypted time in period This scheme is what we recommend for situations when multiple service instances need to coordinate their revision counters, without an actual coordination mechanism. Let T be the number of seconds that have elapsed since the descriptor became valid, plus 1. (T must be at least 1.) Implementations can use the number of seconds since the start time of the shared random protocol run that corresponds to this descriptor. Let S be a secret that all the service providers share. For example, it could be the private signing key corresponding to the current blinded key. Let K be an AES-256 key, generated as K = H("rev-counter-generation" | S) Use K, and AES in counter mode with IV=0, to generate a stream of T * 2 bytes. Consider these bytes as a sequence of T 16-bit little-endian words. Add these words. Let the sum of these words be the revision counter. Cryptowiki attributes roughly this scheme to G. Bebek in: G. Bebek. Anti-tamper database research: Inference control techniques. Technical Report EECS 433 Final Report, Case Western Reserve University, November 2002. Although we believe it is suitable for use in this application, it is not a perfect order-preserving encryption algorithm (and all order-preserving encryption has weaknesses). Please think twice before using it for anything else. (This scheme can be optimized pretty easily by caching the encryption of X*1, X*2, X*3, etc for some well chosen X.) For a slow reference implementation, see src/test/ope_ref.py in the Tor source repository. [XXXX for now, see the same file in Nick's "ope_hax" branch -- it isn't merged yet.] This scheme is not currently implemented in Tor. F.X. Some revision-counter strategies to avoid Though it might be tempting, implementations SHOULD NOT use the current time or the current time within the period directly as their revision counter -- doing so leaks their view of the current time, which can be used to link the onion service to other services run on the same host. Similarly, implementations SHOULD NOT let the revision counter increase forever without resetting it -- doing so links the service across changes in the blinded public key.

Appendix G: Text vectors

G.1. Test vectors for hs-ntor / NTOR-WITH-EXTRA-DATA

Here is a set of test values for the hs-ntor handshake, called [NTOR-WITH-EXTRA-DATA] in this document. They were generated by instrumenting Tor's code to dump the values for an INTRODUCE/RENDEZVOUS handshake, and then by running that code on a Chutney network. We assume an onion service with: KP_hs_ipd_sid = 34E171E4358E501BFF21ED907E96AC6B FEF697C779D040BBAF49ACC30FC5D21F KP_hss_ntor = 8E5127A40E83AABF6493E41F142B6EE3 604B85A3961CD7E38D247239AFF71979 KS_hss_ntor = A0ED5DBF94EEB2EDB3B514E4CF6ABFF6 022051CC5F103391F1970A3FCD15296A N_hs_subcred = 0085D26A9DEBA252263BF0231AEAC59B 17CA11BAD8A218238AD6487CBAD68B57 The client wants to make in INTRODUCE request. It generates the following header (everything before the ENCRYPTED portion) of its INTRODUCE1 cell: H = 000000000000000000000000000000000000000002002034E171E4358E501BFF 21ED907E96AC6BFEF697C779D040BBAF49ACC30FC5D21F00 It generates the following plaintext body to encrypt. (This is the "decrypted plaintext body" from [PROCESS_INTRO2]. P = 6BD364C12638DD5C3BE23D76ACA05B04E6CE932C0101000100200DE6130E4FCA C4EDDA24E21220CC3EADAE403EF6B7D11C8273AC71908DE565450300067F0000 0113890214F823C4F8CC085C792E0AEE0283FE00AD7520B37D0320728D5DF39B 7B7077A0118A900FF4456C382F0041300ACF9C58E51C392795EF870000000000 0000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000 The client now begins the hs-ntor handshake. It generates a curve25519 keypair: x = 60B4D6BF5234DCF87A4E9D7487BDF3F4 A69B6729835E825CA29089CFDDA1E341 X = BF04348B46D09AED726F1D66C618FDEA 1DE58E8CB8B89738D7356A0C59111D5D Then it calculates: ENC_KEY = 9B8917BA3D05F3130DACCE5300C3DC27 F6D012912F1C733036F822D0ED238706 MAC_KEY = FC4058DA59D4DF61E7B40985D122F502 FD59336BC21C30CAF5E7F0D4A2C38FD5 With these, it encrypts the plaintext body P with ENC_KEY, getting an encrypted value C. It computes MAC(MAC_KEY, H | X | C), getting a MAC value M. It then assembles the final INTRODUCE1 body as H | X | C | M: 000000000000000000000000000000000000000002002034E171E4358E501BFF 21ED907E96AC6BFEF697C779D040BBAF49ACC30FC5D21F00BF04348B46D09AED 726F1D66C618FDEA1DE58E8CB8B89738D7356A0C59111D5DADBECCCB38E37830 4DCC179D3D9E437B452AF5702CED2CCFEC085BC02C4C175FA446525C1B9D5530 563C362FDFFB802DAB8CD9EBC7A5EE17DA62E37DEEB0EB187FBB48C63298B0E8 3F391B7566F42ADC97C46BA7588278273A44CE96BC68FFDAE31EF5F0913B9A9C 7E0F173DBC0BDDCD4ACB4C4600980A7DDD9EAEC6E7F3FA3FC37CD95E5B8BFB3E 35717012B78B4930569F895CB349A07538E42309C993223AEA77EF8AEA64F25D DEE97DA623F1AEC0A47F150002150455845C385E5606E41A9A199E7111D54EF2 D1A51B7554D8B3692D85AC587FB9E69DF990EFB776D8

Later the service receives that body in an INTRODUCE2 cell. It processes it according to the hs-ntor handshake, and recovers the client's plaintext P. To continue the hs-ntor handshake, the service chooses a curve25519 keypair:

y = 68CB5188CA0CD7924250404FAB54EE13 92D3D2B9C049A2E446513875952F8F55 Y = 8FBE0DB4D4A9C7FF46701E3E0EE7FD05 CD28BE4F302460ADDEEC9E93354EE700 From this and the client's input, it computes: AUTH_INPUT_MAC = 4A92E8437B8424D5E5EC279245D5C72B 25A0327ACF6DAF902079FCB643D8B208 NTOR_KEY_SEED = 4D0C72FE8AFF35559D95ECC18EB5A368 83402B28CDFD48C8A530A5A3D7D578DB

The service sends back Y | AUTH_INPUT_MAC in its RENDEZVOUS1 cell body. From these, the client finishes the handshake, validates AUTH_INPUT_MAC, and computes the same NTOR_KEY_SEED.

Now that both parties have the same NTOR_KEY_SEED, they can derive the shared key material they will use for their circuit.

BridgeDB specification

Karsten Loesing Nick Mathewson Table of Contents 0. Preliminaries 1. Importing bridge network statuses and bridge descriptors 1.1. Parsing bridge network statuses 1.2. Parsing bridge descriptors 1.3. Parsing extra-info documents 2. Assigning bridges to distributors 3. Giving out bridges upon requests 4. Selecting bridges to be given out based on IP addresses 5. Selecting bridges to be given out based on email addresses 6. Selecting unallocated bridges to be stored in file buckets 7. Displaying Bridge Information 8. Writing bridge assignments for statistics

Preliminaries

This document specifies how BridgeDB processes bridge descriptor files to learn about new bridges, maintains persistent assignments of bridges to distributors, and decides which bridges to give out upon user requests.

Some of the decisions here may be suboptimal: this document is meant to specify current behavior as of August 2013, not to specify ideal behavior.

Importing bridge network statuses and bridge descriptors

BridgeDB learns about bridges by parsing bridge network statuses, bridge descriptors, and extra info documents as specified in Tor's directory protocol. BridgeDB parses one bridge network status file first and at least one bridge descriptor file and potentially one extra info file afterwards.

BridgeDB scans its files on sighup.

BridgeDB does not validate signatures on descriptors or networkstatus files: the operator needs to make sure that these documents have come from a Tor instance that did the validation for us.

Parsing bridge network statuses

Bridge network status documents contain the information of which bridges are known to the bridge authority and which flags the bridge authority assigns to them. We expect bridge network statuses to contain at least the following two lines for every bridge in the given order (format fully specified in Tor's directory protocol):

"r" SP nickname SP identity SP digest SP publication SP IP SP ORPort SP DirPort NL "a" SP address ":" port NL (no more than 8 instances) "s" SP Flags NL

BridgeDB parses the identity and the publication timestamp from the "r" line, the OR address(es) and ORPort(s) from the "a" line(s), and the assigned flags from the "s" line, specifically checking the assignment of the "Running" and "Stable" flags. BridgeDB memorizes all bridges that have the Running flag as the set of running bridges that can be given out to bridge users. BridgeDB memorizes assigned flags if it wants to ensure that sets of bridges given out should contain at least a given number of bridges with these flags.

Parsing bridge descriptors

BridgeDB learns about a bridge's most recent IP address and OR port from parsing bridge descriptors. In theory, both IP address and OR port of a bridge are also contained in the "r" line of the bridge network status, so there is no mandatory reason for parsing bridge descriptors. But the functionality described in this section is still implemented in case we need data from the bridge descriptor in the future.

Bridge descriptor files may contain one or more bridge descriptors. We expect a bridge descriptor to contain at least the following lines in the stated order:

"@purpose" SP purpose NL "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL "published" SP timestamp ["opt" SP] "fingerprint" SP fingerprint NL "router-signature" NL Signature NL

BridgeDB parses the purpose, IP, ORPort, nickname, and fingerprint from these lines. BridgeDB skips bridge descriptors if the fingerprint is not contained in the bridge network status parsed earlier or if the bridge does not have the Running flag. BridgeDB discards bridge descriptors which have a different purpose than "bridge". BridgeDB can be configured to only accept descriptors with another purpose or not discard descriptors based on purpose at all. BridgeDB memorizes the IP addresses and OR ports of the remaining bridges. If there is more than one bridge descriptor with the same fingerprint, BridgeDB memorizes the IP address and OR port of the most recently parsed bridge descriptor. If BridgeDB does not find a bridge descriptor for a bridge contained in the bridge network status parsed before, it does not add that bridge to the set of bridges to be given out to bridge users.

Parsing extra-info documents

BridgeDB learns if a bridge supports a pluggable transport by parsing extra-info documents. Extra-info documents contain the name of the bridge (but only if it is named), the bridge's fingerprint, the type of pluggable transport(s) it supports, and the IP address and port number on which each transport listens, respectively.

Extra-info documents may contain zero or more entries per bridge. We expect an extra-info entry to contain the following lines in the stated order:

"extra-info" SP name SP fingerprint NL "transport" SP transport SP IP ":" PORT ARGS NL

BridgeDB parses the fingerprint, transport type, IP address, port and any arguments that are specified on these lines. BridgeDB skips the name. If the fingerprint is invalid, BridgeDB skips the entry. BridgeDB memorizes the transport type, IP address, port number, and any arguments that are be provided and then it assigns them to the corresponding bridge based on the fingerprint. Arguments are comma-separated and are of the form k=v,k=v. Bridges that do not have an associated extra-info entry are not invalid.

Assigning bridges to distributors

A "distributor" is a mechanism by which bridges are given (or not given) to clients. The current distributors are "email", "https", and "unallocated".

BridgeDB assigns bridges to distributors based on an HMAC hash of the bridge's ID and a secret and makes these assignments persistent. Persistence is achieved by using a database to map node ID to distributor. Each bridge is assigned to exactly one distributor (including the "unallocated" distributor). BridgeDB may be configured to support only a non-empty subset of the distributors specified in this document. BridgeDB may be configured to use different probabilities for assigning new bridges to distributors. BridgeDB does not change existing assignments of bridges to distributors, even if probabilities for assigning bridges to distributors change or distributors are disabled entirely.

Giving out bridges upon requests

Upon receiving a client request, a BridgeDB distributor provides a subset of the bridges assigned to it. BridgeDB only gives out bridges that are contained in the most recently parsed bridge network status and that have the Running flag set (see Section 1). BridgeDB may be configured to give out a different number of bridges (typically 4) depending on the distributor. BridgeDB may define an arbitrary number of rules. These rules may specify the criteria by which a bridge is selected. Specifically, the available rules restrict the IP address version, OR port number, transport type, bridge relay flag, or country in which the bridge should not be blocked.

Selecting bridges to be given out based on IP addresses

BridgeDB may be configured to support one or more distributors which gives out bridges based on the requestor's IP address. Currently, this is how the HTTPS distributor works. The goal is to avoid handing out all the bridges to users in a similar IP space and time. # Someone else should look at proposals/ideas/old/xxx-bridge-disbursement # to see if this section is missing relevant pieces from it. -KL BridgeDB fixes the set of bridges to be returned for a defined time period. BridgeDB considers all IP addresses coming from the same /24 network as the same IP address and returns the same set of bridges. From here on, this non-unique address will be referred to as the IP address's 'area'. BridgeDB divides the IP address space equally into a small number of # Note, changed term from "areas" to "disjoint clusters" -MF disjoint clusters (typically 4) and returns different results for requests coming from addresses that are placed into different clusters. # I found that BridgeDB is not strict in returning only bridges for a # given area. If a ring is empty, it considers the next one. Is this # expected behavior? -KL # # This does not appear to be the case, anymore. If a ring is empty, then # BridgeDB simply returns an empty set of bridges. -MF # # I also found that BridgeDB does not make the assignment to areas # persistent in the database. So, if we change the number of rings, it # will assign bridges to other rings. I assume this is okay? -KL BridgeDB maintains a list of proxy IP addresses and returns the same set of bridges to requests coming from these IP addresses. The bridges returned to proxy IP addresses do not come from the same set as those for the general IP address space.

BridgeDB can be configured to include bridge fingerprints in replies along with bridge IP addresses and OR ports. BridgeDB can be configured to display a CAPTCHA which the user must solve prior to returning the requested bridges.

The current algorithm is as follows. An IP-based distributor splits the bridges uniformly into a set of "rings" based on an HMAC of their ID. Some of these rings are "area" rings for parts of IP space; some are "category" rings for categories of IPs (like proxies). When a client makes a request from an IP, the distributor first sees whether the IP is in one of the categories it knows. If so, the distributor returns an IP from the category rings. If not, the distributor maps the IP into an "area" (that is, a /24), and then uses an HMAC to map the area to one of the area rings.

When the IP-based distributor determines from which area ring it is handing out bridges, it identifies which rules it will use to choose appropriate bridges. Using this information, it searches its cache of rings for one that already adheres to the criteria specified in this request. If one exists, then BridgeDB maps the current "epoch" (N-hour period) and the IP's area (/24) to a point on the ring based on HMAC, and hands out bridges at that point. If a ring does not already exist which satisfies this request, then a new ring is created and filled with bridges that fulfill the requirements. This ring is then used to select bridges as described.

"Mapping X to Y based on an HMAC" above means one of the following:

- We keep all of the elements of Y in some order, with a mapping from all 160-bit strings to positions in Y. - We take an HMAC of X using some fixed string as a key to get a 160-bit value. We then map that value to the next position of Y.

When giving out bridges based on a position in a ring, BridgeDB first looks at flag requirements and port requirements. For example, BridgeDB may be configured to "Give out at least L bridges with port 443, and at least M bridges with Stable, and at most N bridges total." To do this, BridgeDB combines to the results:

- The first L bridges in the ring after the position that have the port 443, and - The first M bridges in the ring after the position that have the flag stable and that it has not already decided to give out, and - The first N-L-M bridges in the ring after the position that it has not already decided to give out. After BridgeDB selects appropriate bridges to return to the requestor, it then prioritises the ordering of them in a list so that as many criteria are fulfilled as possible within the first few bridges. This list is then truncated to N bridges, if possible. N is currently defined as a piecewise function of the number of bridges in the ring such that: / | 1, if len(ring) < 20 | N = | 2, if 20 <= len(ring) <= 100 | | 3, if 100 <= len(ring) \ The bridges in this sublist, containing no more than N bridges, are the bridges returned to the requestor.

Selecting bridges to be given out based on email addresses

BridgeDB can be configured to support one or more distributors that are giving out bridges based on the requestor's email address. Currently, this is how the email distributor works. The goal is to bootstrap based on one or more popular email service's sybil prevention algorithms. # Someone else should look at proposals/ideas/old/xxx-bridge-disbursement # to see if this section is missing relevant pieces from it. -KL BridgeDB rejects email addresses containing other characters than the ones that RFC2822 allows. BridgeDB may be configured to reject email addresses containing other characters it might not process correctly. # I don't think we do this, is it worthwhile? -MF BridgeDB rejects email addresses coming from other domains than a configured set of permitted domains. BridgeDB normalizes email addresses by removing "." characters and by removing parts after the first "+" character. BridgeDB can be configured to discard requests that do not have the value "pass" in their X-DKIM-Authentication-Result header or does not have this header. The X-DKIM-Authentication-Result header is set by the incoming mail stack that needs to check DKIM authentication. BridgeDB does not return a new set of bridges to the same email address until a given time period (typically a few hours) has passed. # Why don't we fix the bridges we give out for a global 3-hour time period # like we do for IP addresses? This way we could avoid storing email # addresses. -KL # The 3-hour value is probably much too short anyway. If we take longer # time values, then people get new bridges when bridges show up, as # opposed to then we decide to reset the bridges we give them. (Yes, this # problem exists for the IP distributor). -NM # I'm afraid I don't fully understand what you mean here. Can you # elaborate? -KL # # Assuming an average churn rate, if we use short time periods, then a # requestor will receive new bridges based on rate-limiting and will (likely) # eventually work their way around the ring; eventually exhausting all bridges # available to them from this distributor. If we use a longer time period, # then each time the period expires there will be more bridges in the ring # thus reducing the likelihood of all bridges being blocked and increasing # the time and effort required to enumerate all bridges. (This is my # understanding, not from Nick) -MF # Also, we presently need the cache to prevent replays and because if a user # sent multiple requests with different criteria in each then we would leak # additional bridges otherwise. -MF BridgeDB can be configured to include bridge fingerprints in replies along with bridge IP addresses and OR ports. BridgeDB can be configured to sign all replies using a PGP signing key. BridgeDB periodically discards old email-address-to-bridge mappings. BridgeDB rejects too frequent email requests coming from the same normalized address.

To map previously unseen email addresses to a set of bridges, BridgeDB proceeds as follows:

- It normalizes the email address as above, by stripping out dots, removing all of the localpart after the +, and putting it all in lowercase. (Example: "John.Doe+bridges@example.COM" becomes "johndoe@example.com".) - It maps an HMAC of the normalized address to a position on its ring of bridges. - It hands out bridges starting at that position, based on the port/flag requirements, as specified at the end of section 4. See section 4 for the details of how bridges are selected from the ring and returned to the requestor.

Selecting unallocated bridges to be stored in file buckets

Kaner should have a look at this section. -NM

BridgeDB can be configured to reserve a subset of bridges and not give them out via one of the distributors. BridgeDB assigns reserved bridges to one or more file buckets of fixed sizes and write these file buckets to disk for manual distribution. BridgeDB ensures that a file bucket always contains the requested number of running bridges. If the requested number of bridges in a file bucket is reduced or the file bucket is not required anymore, the unassigned bridges are returned to the reserved set of bridges. If a bridge stops running, BridgeDB replaces it with another bridge from the reserved set of bridges. # I'm not sure if there's a design bug in file buckets. What happens if # we add a bridge X to file bucket A, and X goes offline? We would add # another bridge Y to file bucket A. OK, but what if A comes back? We # cannot put it back in file bucket A, because it's full. Are we going to # add it to a different file bucket? Doesn't that mean that most bridges # will be contained in most file buckets over time? -KL # # This should be handled the same as if the file bucket is reduced in size. # If X returns, then it should be added to the appropriate distributor. -MF

Displaying Bridge Information

After bridges are selected using one of the methods described in Sections 4 - 6, they are output in one of two formats. Bridges are formatted as:

address:port NL

Pluggable transports are formatted as:

SP address:port [SP arglist] NL

where arglist is an optional space-separated list of key-value pairs in the form of k=v.

Previously, each line was prepended with the "bridge" keyword, such as

"bridge" SP address:port NL

"bridge" SP SP address:port [SP arglist] NL

We don't do this anymore because Vidalia and TorLauncher don't expect it.

See the commit message for b70347a9c5fd769c6d5d0c0eb5171ace2999a736.

Writing bridge assignments for statistics

BridgeDB can be configured to write bridge assignments to disk for statistical analysis. The start of a bridge assignment is marked by the following line:

"bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL

YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed loading new bridges and assigning them to distributors.

For every running bridge there is a line with the following format:

fingerprint SP distributor (SP key "=" value)* NL

The distributor is one out of "email", "https", or "unallocated".

Both "email" and "https" distributors support adding keys for "port", "flag" and "transport". Respectively, the port number, flag name, and transport types are the values. These are used to indicate that a bridge matches certain port, flag, transport criteria of requests.

The "https" distributor also allows the key "ring" with a number as value to indicate to which IP address area the bridge is returned.

The "unallocated" distributor allows the key "bucket" with the file bucket name as value to indicate which file bucket a bridge is assigned to.

Extended ORPort for pluggable transports George Kadianakis, Nick Mathewson Table of Contents 1. Overview 2. Establishing a connection and authenticating. 2.1. Authentication type: SAFE_COOKIE 2.1.2. Cookie-file format 2.1.3. SAFE_COOKIE Protocol specification 3. The extended ORPort protocol 3.1. Protocol 3.2. Command descriptions 3.2.1. USERADDR 3.2.2. TRANSPORT 4. Security Considerations

Overview

This document describes the "Extended ORPort" protocol, a wrapper around Tor's ordinary ORPort protocol for use by bridges that support pluggable transports. It provides a way for server-side PTs and bridges to exchange additional information before beginning the actual OR connection.

See tor-spec.txt for information on the regular OR protocol, and pt-spec.txt for information on pluggable transports.

This protocol was originally proposed in proposal 196, and extended with authentication in proposal 217.

Establishing a connection and authenticating.

When a client (that is to say, a server-side pluggable transport) connects to an Extended ORPort, the server sends:

AuthTypes [variable] EndAuthTypes [1 octet] Where, + AuthTypes are the authentication schemes that the server supports for this session. They are multiple concatenated 1-octet values that take values from 1 to 255. + EndAuthTypes is the special value 0.

The client reads the list of supported authentication schemes, chooses one, and sends it back:

AuthType [1 octet]

Where,

+ AuthType is the authentication scheme that the client wants to use for this session. A valid authentication type takes values from 1 to 255. A value of 0 means that the client did not like the authentication types offered by the server.

If the client sent an AuthType of value 0, or an AuthType that the server does not support, the server MUST close the connection.

Authentication type: SAFE_COOKIE

We define one authentication type: SAFE_COOKIE. Its AuthType value is 1. It is based on the client proving to the bridge that it can access a given "cookie" file on disk. The purpose of authentication is to defend against cross-protocol attacks.

If the Extended ORPort is enabled, Tor should regenerate the cookie file on startup and store it in $DataDirectory/extended_orport_auth_cookie.

The location of the cookie can be overridden by using the configuration file parameter ExtORPortCookieAuthFile, which is defined as:

ExtORPortCookieAuthFile

where is a filesystem path.

Cookie-file format

The format of the cookie-file is:

StaticHeader [32 octets] Cookie [32 octets] Where, + StaticHeader is the following string: "! Extended ORPort Auth Cookie !\x0a" + Cookie is the shared-secret. During the SAFE_COOKIE protocol, the cookie is called CookieString.

Extended ORPort clients MUST make sure that the StaticHeader is present in the cookie file, before proceeding with the authentication protocol.

SAFE_COOKIE Protocol specification

A client that performs the SAFE_COOKIE handshake begins by sending:

ClientNonce [32 octets]

Where,

ClientNonce is 32 octets of random data.

Then, the server replies with:

ServerHash [32 octets] ServerNonce [32 octets] Where, + ServerHash is computed as: HMAC-SHA256(CookieString, "ExtORPort authentication server-to-client hash" | ClientNonce | ServerNonce) + ServerNonce is 32 random octets.

Upon receiving that data, the client computes ServerHash, and validates it against the ServerHash provided by the server.

If the server-provided ServerHash is invalid, the client MUST terminate the connection.

Otherwise the client replies with:

ClientHash [32 octets]

Where, + ClientHash is computed as: HMAC-SHA256(CookieString, "ExtORPort authentication client-to-server hash" | ClientNonce | ServerNonce)

Upon receiving that data, the server computes ClientHash, and validates it against the ClientHash provided by the client.

Finally, the server replies with:

Status [1 octet]

Where, + Status is 1 if the authentication was successful. If the authentication failed, Status is 0.

The extended ORPort protocol

Once a connection is established and authenticated, the parties communicate with the protocol described here.

Protocol

The extended server port protocol is as follows:

COMMAND [2 bytes, big-endian] BODYLEN [2 bytes, big-endian] BODY [BODYLEN bytes] Commands sent from the transport proxy to the bridge are: [0x0000] DONE: There is no more information to give. The next bytes sent by the transport will be those tunneled over it. (body ignored) [0x0001] USERADDR: an address:port string that represents the client's address. [0x0002] TRANSPORT: a string of the name of the pluggable transport currently in effect on the connection. Replies sent from tor to the proxy are: [0x1000] OKAY: Send the user's traffic. (body ignored) [0x1001] DENY: Tor would prefer not to get more traffic from this address for a while. (body ignored) [0x1002] CONTROL: (Not used) Parties MUST ignore command codes that they do not understand.

If the server receives a recognized command that does not parse, it MUST close the connection to the client.

Command descriptions

USERADDR

An ASCII string holding the TCP/IP address of the client of the pluggable transport proxy. A Tor bridge SHOULD use that address to collect statistics about its clients. Recognized formats are: 1.2.3.4:5678 [1:2::3:4]:5678

(Current Tor versions may accept other formats, but this is a bug: transports MUST NOT send them.)

The string MUST not be NUL-terminated.

TRANSPORT

An ASCII string holding the name of the pluggable transport used by the client of the pluggable transport proxy. A Tor bridge that supports multiple transports SHOULD use that information to collect statistics about the popularity of individual pluggable transports.

The string MUST not be NUL-terminated.

Pluggable transport names are C-identifiers and Tor MUST check them for correctness.

Security Considerations

Extended ORPort or TransportControlPort do not provide link confidentiality, authentication or integrity. Sensitive data, like cryptographic material, should not be transferred through them.

An attacker with superuser access is able to sniff network traffic, and capture TransportControlPort identifiers and any data passed through those ports.

Tor SHOULD issue a warning if the bridge operator tries to bind Extended ORPort to a non-localhost address.

Pluggable transport proxies SHOULD issue a warning if they are instructed to connect to a non-localhost Extended ORPort.

Pluggable Transport Specification (Version 1)

Abstract

Pluggable Transports (PTs) are a generic mechanism for the rapid development and deployment of censorship circumvention, based around the idea of modular sub-processes that transform traffic to defeat censors.

This document specifies the sub-process startup, shutdown, and inter-process communication mechanisms required to utilize PTs.

Table of Contents

1. Introduction 1.1. Requirements Notation 2. Architecture Overview 3. Specification 3.1. Pluggable Transport Naming 3.2. Pluggable Transport Configuration Environment Variables 3.2.1. Common Environment Variables 3.2.2. Pluggable Transport Client Environment Variables 3.2.3. Pluggable Transport Server Environment Variables 3.3. Pluggable Transport To Parent Process Communication 3.3.1. Common Messages 3.3.2. Pluggable Transport Client Messages 3.3.3. Pluggable Transport Server Messages 3.4. Pluggable Transport Shutdown 3.5. Pluggable Transport Client Per-Connection Arguments 4. Anonymity Considerations 5 References 6. Acknowledgments Appendix A. Example Client Pluggable Transport Session Appendix B. Example Server Pluggable Transport Session

Introduction

This specification describes a way to decouple protocol-level obfuscation from an application's client/server code, in a manner that promotes rapid development of obfuscation/circumvention tools and promotes reuse beyond the scope of the Tor Project's efforts in that area.

This is accomplished by utilizing helper sub-processes that implement the necessary forward/reverse proxy servers that handle the censorship circumvention, with a well defined and standardized configuration and management interface.

Any application code that implements the interfaces as specified in this document will be able to use all spec compliant Pluggable Transports.

Requirements Notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Architecture Overview

+------------+ +---------------------------+ | Client App +-- Local Loopback --+ PT Client (SOCKS Proxy) +--+ +------------+ +---------------------------+ | | Public Internet (Obfuscated/Transformed traffic) ==> | | +------------+ +---------------------------+ | | Server App +-- Local Loopback --+ PT Server (Reverse Proxy) +--+ +------------+ +---------------------------+

On the client's host, the PT Client software exposes a SOCKS proxy [RFC1928] to the client application, and obfuscates or otherwise transforms traffic before forwarding it to the server's host.

On the server's host, the PT Server software exposes a reverse proxy that accepts connections from PT Clients, and handles reversing the obfuscation/transformation applied to traffic, before forwarding it to the actual server software. An optional lightweight protocol exists to facilitate communicating connection meta-data that would otherwise be lost such as the source IP address and port [EXTORPORT].

All PT instances are configured by the respective parent process via a set of standardized environment variables (3.2) that are set at launch time, and report status information back to the parent via writing output in a standardized format to stdout (3.3).

Each invocation of a PT MUST be either a client OR a server.

All PT client forward proxies MUST support either SOCKS 4 or SOCKS 5, and SHOULD prefer SOCKS 5 over SOCKS 4.

Specification

Pluggable Transport proxies follow the following workflow throughout their lifespan.

1) Parent process sets the required environment values (3.2) and launches the PT proxy as a sub-process (fork()/exec()). 2) The PT Proxy determines the versions of the PT specification supported by the parent"TOR_PT_MANAGED_TRANSPORT_VER" (3.2.1) 2.1) If there are no compatible versions, the PT proxy writes a "VERSION-ERROR" message (3.3.1) to stdout and terminates. 2.2) If there is a compatible version, the PT proxy writes a "VERSION" message (3.3.1) to stdout. 3) The PT Proxy parses the rest of the environment values. 3.1) If the environment values are malformed, or otherwise invalid, the PT proxy writes a "ENV-ERROR" message (3.3.1) to stdout and terminates. 3.2) Determining if it is a client side forward proxy or a server side reverse proxy can be done via examining the "TOR_PT_CLIENT_TRANSPORTS" and "TOR_PT_SERVER_TRANSPORTS" environment variables. 4) (Client only) If there is an upstream proxy specified via "TOR_PT_PROXY" (3.2.2), the PT proxy validates the URI provided. 4.1) If the upstream proxy is unusable, the PT proxy writes a "PROXY-ERROR" message (3.3.2) to stdout and terminates. 4.2) If there is a supported and well-formed upstream proxy the PT proxy writes a "PROXY DONE" message (3.3.2) to stdout. 5) The PT Proxy initializes the transports and reports the status via stdout (3.3.2, 3.3.3) 6) The PT Proxy forwards and transforms traffic as appropriate. 7) Upon being signaled to terminate by the parent process (3.4), the PT Proxy gracefully shuts down.

Pluggable Transport Naming

Pluggable Transport names serve as unique identifiers, and every PT MUST have a unique name.

PT names MUST be valid C identifiers. PT names MUST begin with a letter or underscore, and the remaining characters MUST be ASCII letters, numbers or underscores. No length limit is imposted.

PT names MUST satisfy the regular expression "[a-zA-Z_][a-zA-Z0-9_]*".

Pluggable Transport Configuration Environment Variables

All Pluggable Transport proxy instances are configured by their parent process at launch time via a set of well defined environment variables.

The "TOR_PT_" prefix is used for namespacing reasons and does not indicate any relations to Tor, except for the origins of this specification.

Common Environment Variables

When launching either a client or server Pluggable Transport proxy, the following common environment variables MUST be set.

"TOR_PT_MANAGED_TRANSPORT_VER"

Specifies the versions of the Pluggable Transport specification the parent process supports, delimited by commas. All PTs MUST accept any well-formed list, as long as a compatible version is present.

Valid versions MUST consist entirely of non-whitespace, non-comma printable ASCII characters.

The version of the Pluggable Transport specification as of this document is "1".

Example:

TOR_PT_MANAGED_TRANSPORT_VER=1,1a,2b,this_is_a_valid_ver

"TOR_PT_STATE_LOCATION"

Specifies an absolute path to a directory where the PT is allowed to store state that will be persisted across invocations. The directory is not required to exist when the PT is launched, however PT implementations SHOULD be able to create it as required.

PTs MUST only store files in the path provided, and MUST NOT create or modify files elsewhere on the system.

Example:

TOR_PT_STATE_LOCATION=/var/lib/tor/pt_state/

"TOR_PT_EXIT_ON_STDIN_CLOSE"

Specifies that the parent process will close the PT proxy's standard input (stdin) stream to indicate that the PT proxy should gracefully exit.

PTs MUST NOT treat a closed stdin as a signal to terminate unless this environment variable is set to "1".

PTs SHOULD treat stdin being closed as a signal to gracefully terminate if this environment variable is set to "1".

Example:

TOR_PT_EXIT_ON_STDIN_CLOSE=1

"TOR_PT_OUTBOUND_BIND_ADDRESS_V4"

Specifies an IPv4 IP address that the PT proxy SHOULD use as source address for outgoing IPv4 IP packets. This feature allows people with multiple network interfaces to specify explicitly which interface they prefer the PT proxy to use.

If this value is unset or empty, the PT proxy MUST use the default source address for outgoing connections.

This setting MUST be ignored for connections to loopback addresses (127.0.0.0/8).

Example:

TOR_PT_OUTBOUND_BIND_ADDRESS_V4=203.0.113.4

"TOR_PT_OUTBOUND_BIND_ADDRESS_V6"

Specifies an IPv6 IP address that the PT proxy SHOULD use as source address for outgoing IPv6 IP packets. This feature allows people with multiple network interfaces to specify explicitly which interface they prefer the PT proxy to use.

If this value is unset or empty, the PT proxy MUST use the default source address for outgoing connections.

This setting MUST be ignored for connections to the loopback address ([::1]).

IPv6 addresses MUST always be wrapped in square brackets.

Example::

TOR_PT_OUTBOUND_BIND_ADDRESS_V6=[2001:db8::4]

Pluggable Transport Client Environment Variables

Client-side Pluggable Transport forward proxies are configured via the following environment variables.

"TOR_PT_CLIENT_TRANSPORTS"

Specifies the PT protocols the client proxy should initialize, as a comma separated list of PT names.

PTs SHOULD ignore PT names that it does not recognize.

Parent processes MUST set this environment variable when launching a client-side PT proxy instance.

Example:

TOR_PT_CLIENT_TRANSPORTS=obfs2,obfs3,obfs4

"TOR_PT_PROXY"

Specifies an upstream proxy that the PT MUST use when making outgoing network connections. It is a URI [RFC3986] of the format:

<proxy_type>://[<user_name>[:][@]:.

The "TOR_PT_PROXY" environment variable is OPTIONAL and MUST be omitted if there is no need to connect via an upstream proxy.

Examples:

TOR_PT_PROXY=socks5://tor:test1234@198.51.100.1:8000 TOR_PT_PROXY=socks4a://198.51.100.2:8001 TOR_PT_PROXY=http://198.51.100.3:443

Pluggable Transport Server Environment Variables

Server-side Pluggable Transport reverse proxies are configured via the following environment variables.

"TOR_PT_SERVER_TRANSPORTS"

Specifies the PT protocols the server proxy should initialize, as a comma separated list of PT names.

PTs SHOULD ignore PT names that it does not recognize.

Parent processes MUST set this environment variable when launching a server-side PT reverse proxy instance.

Example:

TOR_PT_SERVER_TRANSPORTS=obfs3,scramblesuit

"TOR_PT_SERVER_TRANSPORT_OPTIONS"

Specifies per-PT protocol configuration directives, as a semicolon-separated list of : pairs, where is a PT name and is a k=v string value with options that are to be passed to the transport.

Colons, semicolons, and backslashes MUST be escaped with a backslash.

If there are no arguments that need to be passed to any of PT transport protocols, "TOR_PT_SERVER_TRANSPORT_OPTIONS" MAY be omitted.

Example:

TOR_PT_SERVER_TRANSPORT_OPTIONS=scramblesuit:key=banana;automata:rule=110;automata:depth=3

Will pass to 'scramblesuit' the parameter 'key=banana' and to 'automata' the arguments 'rule=110' and 'depth=3'. "TOR_PT_SERVER_BINDADDR"

A comma separated list of - pairs, where is a PT name and is the
: on which it should listen for incoming client connections.

The keys holding transport names MUST be in the same order as they appear in "TOR_PT_SERVER_TRANSPORTS".

The
MAY be a locally scoped address as long as port forwarding is done externally.

The
: combination MUST be an IP address supported by bind(), and MUST NOT be a host name.

Applications MUST NOT set more than one
: pair per PT name.

If there is no specific
: combination to be configured for any transports, "TOR_PT_SERVER_BINDADDR" MAY be omitted.

Example:

TOR_PT_SERVER_BINDADDR=obfs3-198.51.100.1:1984,scramblesuit-127.0.0.1:4891

"TOR_PT_ORPORT"

Specifies the destination that the PT reverse proxy should forward traffic to after transforming it as appropriate, as an

:.
Connections to the destination specified via "TOR_PT_ORPORT" MUST only contain application payload. If the parent process requires the actual source IP address of client connections (or other metadata), it should set "TOR_PT_EXTENDED_SERVER_PORT" instead.

Example:

TOR_PT_ORPORT=127.0.0.1:9001

"TOR_PT_EXTENDED_SERVER_PORT"

Specifies the destination that the PT reverse proxy should forward traffic to, via the Extended ORPort protocol [EXTORPORT] as an
:.

The Extended ORPort protocol allows the PT reverse proxy to communicate per-connection metadata such as the PT name and client IP address/port to the parent process.

If the parent process does not support the ExtORPort protocol, it MUST set "TOR_PT_EXTENDED_SERVER_PORT" to an empty string.

Example:

TOR_PT_EXTENDED_SERVER_PORT=127.0.0.1:4200

"TOR_PT_AUTH_COOKIE_FILE"

Specifies an absolute filesystem path to the Extended ORPort authentication cookie, required to communicate with the Extended ORPort specified via "TOR_PT_EXTENDED_SERVER_PORT".

If the parent process is not using the ExtORPort protocol for incoming traffic, "TOR_PT_AUTH_COOKIE_FILE" MUST be omitted.

Example:

TOR_PT_AUTH_COOKIE_FILE=/var/lib/tor/extended_orport_auth_cookie

Pluggable Transport To Parent Process Communication

All Pluggable Transport Proxies communicate to the parent process via writing NL-terminated lines to stdout. The line metaformat is:

::= ::= | ::= <any US-ASCII alphanumeric, dash, and underscore> ::= * ::= | ::= ::= <US-ASCII whitespace symbol (32)> ::= <US-ASCII newline (line feed) character (10)>

The parent process MUST ignore lines received from PT proxies with unknown keywords.

Common Messages

When a PT proxy first starts up, it must determine which version of the Pluggable Transports Specification to use to configure itself.

It does this via the "TOR_PT_MANAGED_TRANSPORT_VER" (3.2.1) environment variable which contains all of the versions supported by the application.

Upon determining the version to use, or lack thereof, the PT proxy responds with one of two messages.

VERSION-ERROR

The "VERSION-ERROR" message is used to signal that there was no compatible Pluggable Transport Specification version present in the "TOR_PT_MANAGED_TRANSPORT_VER" list.

The SHOULD be set to "no-version" for historical reasons but MAY be set to a useful error message instead.

PT proxies MUST terminate after outputting a "VERSION-ERROR" message.

Example:

VERSION-ERROR no-version

VERSION

The "VERSION" message is used to signal the Pluggable Transport Specification version (as in "TOR_PT_MANAGED_TRANSPORT_VER") that the PT proxy will use to configure its transports and communicate with the parent process.

The version for the environment values and reply messages specified by this document is "1".

PT proxies MUST either report an error and terminate, or output a "VERSION" message before moving on to client/server proxy initialization and configuration.

Example:

VERSION 1

After version negotiation has been completed the PT proxy must then validate that all of the required environment variables are provided, and that all of the configuration values supplied are well formed.

At any point, if there is an error encountered related to configuration supplied via the environment variables, it MAY respond with an error message and terminate.

ENV-ERROR

The "ENV-ERROR" message is used to signal the PT proxy's failure to parse the configuration environment variables (3.2).

The SHOULD consist of a useful error message that can be used to diagnose and correct the root cause of the failure.

PT proxies MUST terminate after outputting a "ENV-ERROR" message.

Example:

ENV-ERROR No TOR_PT_AUTH_COOKIE_FILE when TOR_PT_EXTENDED_SERVER_PORT set

Pluggable Transport Client Messages

After negotiating the Pluggable Transport Specification version, PT client proxies MUST first validate "TOR_PT_PROXY" (3.2.2) if it is set, before initializing any transports.

Assuming that an upstream proxy is provided, PT client proxies MUST respond with a message indicating that the proxy is valid, supported, and will be used OR a failure message.

PROXY DONE

The "PROXY DONE" message is used to signal the PT proxy's acceptance of the upstream proxy specified by "TOR_PT_PROXY".

PROXY-ERROR

The "PROXY-ERROR" message is used to signal that the upstream proxy is malformed/unsupported or otherwise unusable.

PT proxies MUST terminate immediately after outputting a "PROXY-ERROR" message.

Example:

PROXY-ERROR SOCKS 4 upstream proxies unsupported.

After the upstream proxy (if any) is configured, PT clients then iterate over the requested transports in "TOR_PT_CLIENT_TRANSPORTS" and initialize the listeners.

For each transport initialized, the PT proxy reports the listener status back to the parent via messages to stdout.

CMETHOD <'socks4','socks5'> address:port

The "CMETHOD" message is used to signal that a requested PT transport has been launched, the protocol which the parent should use to make outgoing connections, and the IP address and port that the PT transport's forward proxy is listening on.

Example:

CMETHOD trebuchet socks5 127.0.0.1:19999

CMETHOD-ERROR

The "CMETHOD-ERROR" message is used to signal that requested PT transport was unable to be launched.

Example:

CMETHOD-ERROR trebuchet no rocks available

Once all PT transports have been initialized (or have failed), the PT proxy MUST send a final message indicating that it has finished initializing.

CMETHODS DONE

The "CMETHODS DONE" message signals that the PT proxy has finished initializing all of the transports that it is capable of handling.

Upon sending the "CMETHODS DONE" message, the PT proxy initialization is complete.

Notes:

- Unknown transports in "TOR_PT_CLIENT_TRANSPORTS" are ignored entirely, and MUST NOT result in a "CMETHOD-ERROR" message. Thus it is entirely possible for a given PT proxy to immediately output "CMETHODS DONE". - Parent processes MUST handle "CMETHOD"/"CMETHOD-ERROR" messages in any order, regardless of ordering in "TOR_PT_CLIENT_TRANSPORTS".

Pluggable Transport Server Messages

PT server reverse proxies iterate over the requested transports in "TOR_PT_CLIENT_TRANSPORTS" and initialize the listeners.

For each transport initialized, the PT proxy reports the listener status back to the parent via messages to stdout.

SMETHOD address:port [options]

The "SMETHOD" message is used to signal that a requested PT transport has been launched, the protocol which will be used to handle incoming connections, and the IP address and port that clients should use to reach the reverse-proxy.

If there is a specific address:port provided for a given PT transport via "TOR_PT_SERVER_BINDADDR", the transport MUST be initialized using that as the server address.

The OPTIONAL 'options' field is used to pass additional per-transport information back to the parent process.

The currently recognized 'options' are:

ARGS:[=,]+[=]

The "ARGS" option is used to pass additional key/value formatted information that clients will require to use the reverse proxy. Equal signs and commas MUST be escaped with a backslash. Tor: The ARGS are included in the transport line of the Bridge's extra-info document. Examples: SMETHOD trebuchet 198.51.100.1:19999 SMETHOD rot_by_N 198.51.100.1:2323 ARGS:N=13 SMETHOD-ERROR <transport> <ErrorMessage>

The "SMETHOD-ERROR" message is used to signal that requested PT transport reverse proxy was unable to be launched.

Example:

SMETHOD-ERROR trebuchet no cows available

Once all PT transports have been initialized (or have failed), the PT proxy MUST send a final message indicating that it has finished initializing.

SMETHODS DONE

The "SMETHODS DONE" message signals that the PT proxy has finished initializing all of the transports that it is capable of handling.

Upon sending the "SMETHODS DONE" message, the PT proxy initialization is complete.

Pluggable Transport Log Message

This message is for a client or server PT to be able to signal back to the parent process via stdout or stderr any log messages.

A log message can be any kind of messages (human readable) that the PT sends back so the parent process can gather information about what is going on in the child process. It is not intended for the parent process to parse and act accordingly but rather a message used for plain logging.

For example, the tor daemon logs those messages at the Severity level and sends them onto the control port using the PT_LOG (see control-spec.txt) event so any third party can pick them up for debugging.

The format of the message:

LOG SEVERITY=Severity MESSAGE=Message

The SEVERITY value indicate at which logging level the message applies. The accepted values for are: error, warning, notice, info, debug

The MESSAGE value is a human readable string formatted by the PT. The contains the log message which can be a String or CString (see section 2 in control-spec.txt).

Example:

LOG SEVERITY=debug MESSAGE="Connected to bridge A"

Pluggable Transport Status Message

This message is for a client or server PT to be able to signal back to the parent process via stdout or stderr any status messages.

The format of the message:

STATUS TRANSPORT=Transport <K_1>=<V_1> [<K_2>=<V_2> ...]

The TRANSPORT value indicates a hint on what the PT is such has the name or the protocol used for instance. As an example, obfs4proxy would use "obfs4". Thus, the Transport value can be anything the PT itself defines and it can be a String or CString (see section 2 in control-spec.txt).

The <K_n>=<V_n> values are specific to the PT and there has to be at least one. They are messages that reflects the status that the PT wants to report. <V_n> can be a String or CString.

Examples (fictional):

STATUS TRANSPORT=obfs4 ADDRESS=198.51.100.123:1234 CONNECT=Success STATUS TRANSPORT=obfs4 ADDRESS=198.51.100.222:2222 CONNECT=Failed FINGERPRINT= ERRSTR="Connection refused" STATUS TRANSPORT=trebuchet ADDRESS=198.51.100.15:443 PERCENT=42

Pluggable Transport Shutdown

The recommended way for Pluggable Transport using applications and Pluggable Transports to handle graceful shutdown is as follows.

- (Parent) Set "TOR_PT_EXIT_ON_STDIN_CLOSE" (3.2.1) when launching the PT proxy, to indicate that stdin will be used for graceful shutdown notification. - (Parent) When the time comes to terminate the PT proxy: 1. Close the PT proxy's stdin. 2. Wait for a "reasonable" amount of time for the PT to exit. 3. Attempt to use OS specific mechanisms to cause graceful PT shutdown (eg: 'SIGTERM') 4. Use OS specific mechanisms to force terminate the PT (eg: 'SIGKILL', 'ProccessTerminate()'). - PT proxies SHOULD monitor stdin, and exit gracefully when it is closed, if the parent supports that behavior. - PT proxies SHOULD handle OS specific mechanisms to gracefully terminate (eg: Install a signal handler on 'SIGTERM' that causes cleanup and a graceful shutdown if able). - PT proxies SHOULD attempt to detect when the parent has terminated (eg: via detecting that its parent process ID has changed on U*IX systems), and gracefully terminate.

Pluggable Transport Client Per-Connection Arguments

Certain PT transport protocols require that the client provides per-connection arguments when making outgoing connections. On the server side, this is handled by the "ARGS" optional argument as part of the "SMETHOD" message.

On the client side, arguments are passed via the authentication fields that are part of the SOCKS protocol.

First the "=" formatted arguments MUST be escaped, such that all backslash, equal sign, and semicolon characters are escaped with a backslash.

Second, all of the escaped are concatenated together.

Example:

shared-secret=rahasia;secrets-file=/tmp/blob

Lastly the arguments are transmitted when making the outgoing connection using the authentication mechanism specific to the SOCKS protocol version.

- In the case of SOCKS 4, the concatenated argument list is transmitted in the "USERID" field of the "CONNECT" request. - In the case of SOCKS 5, the parent process must negotiate "Username/Password" authentication [RFC1929], and transmit the arguments encoded in the "UNAME" and "PASSWD" fields.

If the encoded argument list is less than 255 bytes in length, the "PLEN" field must be set to "1" and the "PASSWD" field must contain a single NUL character.

Anonymity Considerations

When designing and implementing a Pluggable Transport, care should be taken to preserve the privacy of clients and to avoid leaking personally identifying information.

Examples of client related considerations are:

Not logging client IP addresses to disk.

Not leaking DNS addresses except when necessary.

- Ensuring that "TOR_PT_PROXY"'s "fail closed" behavior is implemented correctly.

Additionally, certain obfuscation mechanisms rely on information such as the server IP address/port being confidential, so clients also need to take care to preserve server side information confidential when applicable.

References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., Jones, L., "SOCKS Protocol Version 5", RFC 1928, March 1996. [EXTORPORT] Kadianakis, G., Mathewson, N., "Extended ORPort and TransportControlPort", Tor Proposal 196, March 2012. [RFC3986] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource Identifier (URI): Generic Syntax", RFC 3986, January 2005. [RFC1929] Leech, M., "Username/Password Authentication for SOCKS V5", RFC 1929, March 1996.

Acknowledgments

This specification draws heavily from prior versions done by Jacob Appelbaum, Nick Mathewson, and George Kadianakis.

Appendix A: Example Client Pluggable Transport Session

Environment variables:

TOR_PT_MANAGED_TRANSPORT_VER=1 TOR_PT_STATE_LOCATION=/var/lib/tor/pt_state/ TOR_PT_EXIT_ON_STDIN_CLOSE=1 TOR_PT_PROXY=socks5://127.0.0.1:8001 TOR_PT_CLIENT_TRANSPORTS=obfs3,obfs4

Messages the PT Proxy writes to stdin:

VERSION 1 PROXY DONE CMETHOD obfs3 socks5 127.0.0.1:32525 CMETHOD obfs4 socks5 127.0.0.1:37347 CMETHODS DONE

Appendix B: Example Server Pluggable Transport Session

Environment variables:

TOR_PT_MANAGED_TRANSPORT_VER=1 TOR_PT_STATE_LOCATION=/var/lib/tor/pt_state TOR_PT_EXIT_ON_STDIN_CLOSE=1 TOR_PT_SERVER_TRANSPORTS=obfs3,obfs4 TOR_PT_SERVER_BINDADDR=obfs3-198.51.100.1:1984

Messages the PT Proxy writes to stdin:

VERSION 1 SMETHOD obfs3 198.51.100.1:1984 SMETHOD obfs4 198.51.100.1:43734 ARGS:cert=HszPy3vWfjsESCEOo9ZBkRv6zQ/1mGHzc8arF0y2SpwFr3WhsMu8rK0zyaoyERfbz3ddFw,iat-mode=0 SMETHODS DONE

GetTor specification Jacob Appelbaum Table of Contents 0. Preface 1. Overview 2. Implementation 2.1. Reference implementation 3. SMTP transport 3.1. SMTP transport security considerations 3.2. SMTP transport privacy considerations 4. Other transports 5. Implementation suggestions

Preface

This document describes GetTor and how to properly implementation GetTor.

Overview

GetTor was created to resolve direct and indirect censorship of Tor's software. In many countries and networks Tor's main website is blocked and would-be Tor users are unable to download even the source code to the Tor program. Other software hosted by the Tor Project is similarly censored. The filtering of the possible download sites is sometimes easy to bypass by using our TLS enabled website. In other cases the website and all of the mirrors are entirely blocked; this is a situation where a user seems to actually need Tor to fetch Tor. We discovered that it is feasible to use alternate transport methods such as SMTP between a non-trusted third party or with IRC and XDCC.

Implementation

Any compliant GetTor implementation will implement at least a single transport to meet the needs of a certain class of users. It should be i18n and l10n compliant for all user facing interactions; users should be able to manually set their language and this should serve as their preference for localization of any software delivered. The implementation must be free software and it should be freely available by request from the implementation that they interface with to download any of the other software available from that GetTor instance. Security and privacy considerations should be described on a per transport basis.

Reference implementation

We have implemented[0] a compliant GetTor that supports SMTP as a transport.

SMTP transport

The SMTP transport for GetTor should allow users to send any RFC822 compliant message in any known human language; GetTor should respond in whatever language is detected with supplementary translations in the same email. GetTor shall offer a list of all available software in the body of the email - it should offer the software as a list of packages and their subsequent descriptions.

SMTP transport security considerations

Any GetTor instance that offers SMTP as a transport should optionally implement the checking of DKIM signatures to ensure that email is not forged. Optionally GetTor should take an OpenPGP key from the user and encrypt the response with a blinded message.

SMTP transport privacy considerations

Any GetTor instance that offers SMTP as a transport must at least store the requester's address for the time that it takes to process a response. This should not be written to any permanent storage medium; GetTor should function without any long term storage excepting a cache of files that it will send to any user who requests it.

GetTor may optionally collect anonymized usage statistics to better understand how GetTor[1] is in use. This must not include any personally identifying information about any of the requester beyond language selection.

Other transports

At this time no other transports have been specified. IRC XDCC is a likely useful system as is XMPP/Jabber with the newest OTR file sharing transport.

Implementation suggestions

It is suggested that any compliant GetTor instance should be written in a so called "safe" language such as Python.

[0] https://gitweb.torproject.org/gettor.git [1] https://metrics.torproject.org/packages.html

TC: A Tor control protocol (Version 1)

Table of Contents

0. Scope 1. Protocol outline 1.1. Forward-compatibility 2. Message format 2.1. Description format 2.1.1. Notes on an escaping bug 2.2. Commands from controller to Tor 2.3. Replies from Tor to the controller 2.4. General-use tokens 3. Commands 3.1. SETCONF 3.2. RESETCONF 3.3. GETCONF 3.4. SETEVENTS 3.5. AUTHENTICATE 3.6. SAVECONF 3.7. SIGNAL 3.8. MAPADDRESS 3.9. GETINFO 3.10. EXTENDCIRCUIT 3.11. SETCIRCUITPURPOSE 3.12. SETROUTERPURPOSE 3.13. ATTACHSTREAM 3.14. POSTDESCRIPTOR 3.15. REDIRECTSTREAM 3.16. CLOSESTREAM 3.17. CLOSECIRCUIT 3.18. QUIT 3.19. USEFEATURE 3.20. RESOLVE 3.21. PROTOCOLINFO 3.22. LOADCONF 3.23. TAKEOWNERSHIP 3.24. AUTHCHALLENGE 3.25. DROPGUARDS 3.26. HSFETCH 3.27. ADD_ONION 3.28. DEL_ONION 3.29. HSPOST 3.30. ONION_CLIENT_AUTH_ADD 3.31. ONION_CLIENT_AUTH_REMOVE 3.32. ONION_CLIENT_AUTH_VIEW 3.33. DROPOWNERSHIP 3.34. DROPTIMEOUTS 4. Replies 4.1. Asynchronous events 4.1.1. Circuit status changed 4.1.2. Stream status changed 4.1.3. OR Connection status changed 4.1.4. Bandwidth used in the last second 4.1.5. Log messages 4.1.6. New descriptors available 4.1.7. New Address mapping 4.1.8. Descriptors uploaded to us in our role as authoritative dirserver 4.1.9. Our descriptor changed 4.1.10. Status events 4.1.11. Our set of guard nodes has changed 4.1.12. Network status has changed 4.1.13. Bandwidth used on an application stream 4.1.14. Per-country client stats 4.1.15. New consensus networkstatus has arrived 4.1.16. New circuit buildtime has been set 4.1.17. Signal received 4.1.18. Configuration changed 4.1.19. Circuit status changed slightly 4.1.20. Pluggable transport launched 4.1.21. Bandwidth used on an OR or DIR or EXIT connection 4.1.22. Bandwidth used by all streams attached to a circuit 4.1.23. Per-circuit cell stats 4.1.24. Token buckets refilled 4.1.25. HiddenService descriptors 4.1.26. HiddenService descriptors content 4.1.27. Network liveness has changed 4.1.28. Pluggable Transport Logs 4.1.29. Pluggable Transport Status 5. Implementation notes 5.1. Authentication 5.2. Don't let the buffer get too big 5.3. Backward compatibility with v0 control protocol 5.4. Tor config options for use by controllers 5.5. Phases from the Bootstrap status event 5.5.1. Overview of Bootstrap reporting. 5.5.2. Phases in Bootstrap Stage 1 5.5.3. Phases in Bootstrap Stage 2 5.5.4. Phases in Bootstrap Stage 3 5.6 Bootstrap phases reported by older versions of Tor

Scope

This document describes an implementation-specific protocol that is used for other programs (such as frontend user-interfaces) to communicate with a locally running Tor process. It is not part of the Tor onion routing protocol.

This protocol replaces version 0 of TC, which is now deprecated. For reference, TC is described in "control-spec-v0.txt". Implementors are recommended to avoid using TC directly, but instead to use a library that can easily be updated to use the newer protocol. (Version 0 is used by Tor versions 0.1.0.x; the protocol in this document only works with Tor versions in the 0.1.1.x series and later.)

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Protocol outline

TC is a bidirectional message-based protocol. It assumes an underlying stream for communication between a controlling process (the "client" or "controller") and a Tor process (or "server"). The stream may be implemented via TCP, TLS-over-TCP, a Unix-domain socket, or so on, but it must provide reliable in-order delivery. For security, the stream should not be accessible by untrusted parties.

In TC, the client and server send typed messages to each other over the underlying stream. The client sends "commands" and the server sends "replies".

By default, all messages from the server are in response to messages from the client. Some client requests, however, will cause the server to send messages to the client indefinitely far into the future. Such "asynchronous" replies are marked as such.

Servers respond to messages in the order messages are received.

Forward-compatibility

This is an evolving protocol; new client and server behavior will be allowed in future versions. To allow new backward-compatible behavior on behalf of the client, we may add new commands and allow existing commands to take new arguments in future versions. To allow new backward-compatible server behavior, we note various places below where servers speaking a future version of this protocol may insert new data, and note that clients should/must "tolerate" unexpected elements in these places. There are two ways that we do this:

Adding a new field to a message:

For example, we might say "This message has three space-separated fields; clients MUST tolerate more fields." This means that a client MUST NOT crash or otherwise fail to parse the message or other subsequent messages when there are more than three fields, and that it SHOULD function at least as well when more fields are provided as it does when it only gets the fields it accepts. The most obvious way to do this is by ignoring additional fields; the next-most-obvious way is to report additional fields verbatim to the user, perhaps as part of an expert UI. * Adding a new possible value to a list of alternatives: For example, we might say "This field will be OPEN, CLOSED, or CONNECTED. Clients MUST tolerate unexpected values." This means that a client MUST NOT crash or otherwise fail to parse the message or other subsequent messages when there are unexpected values, and that it SHOULD try to handle the rest of the message as well as it can. The most obvious way to do this is by pretending that each list of alternatives has an additional "unrecognized value" element, and mapping any unrecognized values to that element; the next-most-obvious way is to create a separate "unrecognized value" element for each unrecognized value. Clients SHOULD NOT "tolerate" unrecognized alternatives by pretending that the message containing them is absent. For example, a stream closed for an unrecognized reason is nevertheless closed, and should be reported as such. (If some list of alternatives is given, and there isn't an explicit statement that clients must tolerate unexpected values, clients still must tolerate unexpected values. The only exception would be if there were an explicit statement that no future values will ever be added.)

Message format

Description format

The message formats listed below use ABNF as described in RFC 2234. The protocol itself is loosely based on SMTP (see RFC 2821).

We use the following nonterminals from RFC 2822: atom, qcontent

We define the following general-use nonterminals:

QuotedString = DQUOTE *qcontent DQUOTE

There are explicitly no limits on line length. All 8-bit characters are permitted unless explicitly disallowed. In QuotedStrings, backslashes and quotes must be escaped; other characters need not be escaped.

Wherever CRLF is specified to be accepted from the controller, Tor MAY also accept LF. Tor, however, MUST NOT generate LF instead of CRLF. Controllers SHOULD always send CRLF.

Notes on an escaping bug

CString = DQUOTE *qcontent DQUOTE

Note that although these nonterminals have the same grammar, they are interpreted differently. In a QuotedString, a backslash followed by any character represents that character. But in a CString, the escapes "\n", "\t", "\r", and the octal escapes "\0" ... "\377" represent newline, tab, carriage return, and the 256 possible octet values respectively.

The use of CString in this document reflects a bug in Tor; they should have been QuotedString instead. In the future, they may migrate to use QuotedString instead. If they do, the QuotedString implementation will never place a backslash before a "n", "t", "r", or digit, to ensure that old controllers don't get confused.

For future-proofing, controller implementors MAY use the following rules to be compatible with buggy Tor implementations and with future ones that implement the spec as intended:

Read \n \t \r and \0 ... \377 as C escapes. Treat a backslash followed by any other character as that character.

Currently, many of the QuotedString instances below that Tor outputs are in fact CStrings. We intend to fix this in future versions of Tor, and document which ones were broken. (See bugtracker ticket #14555 for a bit more information.)

Note that this bug exists only in strings generated by Tor for the Tor controller; Tor should parse input QuotedStrings from the controller correctly.

Commands from controller to Tor

Command = Keyword OptArguments CRLF / "+" Keyword OptArguments CRLF CmdData Keyword = 1*ALPHA OptArguments = [ SP *(SP / VCHAR) ]

A command is either a single line containing a Keyword and arguments, or a multiline command whose initial keyword begins with +, and whose data section ends with a single "." on a line of its own. (We use a special character to distinguish multiline commands so that Tor can correctly parse multi-line commands that it does not recognize.) Specific commands and their arguments are described below in section 3.

Replies from Tor to the controller

Reply = SyncReply / AsyncReply SyncReply = *(MidReplyLine / DataReplyLine) EndReplyLine AsyncReply = *(MidReplyLine / DataReplyLine) EndReplyLine MidReplyLine = StatusCode "-" ReplyLine DataReplyLine = StatusCode "+" ReplyLine CmdData EndReplyLine = StatusCode SP ReplyLine ReplyLine = [ReplyText] CRLF ReplyText = XXXX StatusCode = 3DIGIT

Unless specified otherwise, multiple lines in a single reply from Tor to the controller are guaranteed to share the same status code. Specific replies are mentioned below in section 3, and described more fully in section 4.

[Compatibility note: versions of Tor before 0.2.0.3-alpha sometimes generate AsyncReplies of the form "*(MidReplyLine / DataReplyLine)". This is incorrect, but controllers that need to work with these versions of Tor should be prepared to get multi-line AsyncReplies with the final line (usually "650 OK") omitted.]

General-use tokens

; CRLF means, "the ASCII Carriage Return character (decimal value 13) ; followed by the ASCII Linefeed character (decimal value 10)." CRLF = CR LF

; How a controller tells Tor about a particular OR. There are four ; possible formats: ; $Fingerprint -- The router whose identity key hashes to the fingerprint. ; This is the preferred way to refer to an OR. ; $Fingerprint~Nickname -- The router whose identity key hashes to the ; given fingerprint, but only if the router has the given nickname. ; $Fingerprint=Nickname -- The router whose identity key hashes to the ; given fingerprint, but only if the router is Named and has the given ; nickname. ; Nickname -- The Named router with the given nickname, or, if no such ; router exists, any router whose nickname matches the one given. ; This is not a safe way to refer to routers, since Named status ; could under some circumstances change over time. ; ; The tokens that implement the above follow:

ServerSpec = LongName / Nickname LongName = Fingerprint [ "~" Nickname ]

; For tors older than 0.3.1.3-alpha, LongName may have included an equal ; sign ("=") in lieu of a tilde ("~"). The presence of an equal sign ; denoted that the OR possessed the "Named" flag:

LongName = Fingerprint [ ( "=" / "~" ) Nickname ]

Fingerprint = "$" 40HEXDIG NicknameChar = "a"-"z" / "A"-"Z" / "0" - "9" Nickname = 119 NicknameChar

; What follows is an outdated way to refer to ORs. ; Feature VERBOSE_NAMES replaces ServerID with LongName in events and ; GETINFO results. VERBOSE_NAMES can be enabled starting in Tor version ; 0.1.2.2-alpha and it is always-on in 0.2.2.1-alpha and later. ServerID = Nickname / Fingerprint

; Unique identifiers for streams or circuits. Currently, Tor only ; uses digits, but this may change StreamID = 116 IDChar CircuitID = 116 IDChar ConnID = 116 IDChar QueueID = 116 IDChar IDChar = ALPHA / DIGIT

Address = ip4-address / ip6-address / hostname (XXXX Define these)

; A "CmdData" section is a sequence of octets concluded by the terminating ; sequence CRLF "." CRLF. The terminating sequence may not appear in the ; body of the data. Leading periods on lines in the data are escaped with ; an additional leading period as in RFC 2821 section 4.5.2. CmdData = DataLine "." CRLF DataLine = CRLF / "." 1LineItem CRLF / NonDotItem LineItem CRLF LineItem = NonCR / 1CR NonCRLF NonDotItem = NonDotCR / 1*CR NonCRLF

; ISOTime, ISOTime2, and ISOTime2Frac are time formats as specified in ; ISO8601. ; example ISOTime: "2012-01-11 12:15:33" ; example ISOTime2: "2012-01-11T12:15:33" ; example ISOTime2Frac: "2012-01-11T12:15:33.51" IsoDatePart = 4DIGIT "-" 2DIGIT "-" 2DIGIT IsoTimePart = 2DIGIT ":" 2DIGIT ":" 2DIGIT ISOTime = IsoDatePart " " IsoTimePart ISOTime2 = IsoDatePart "T" IsoTimePart ISOTime2Frac = IsoTime2 [ "." 1*DIGIT ]

; Numbers LeadingDigit = "1" - "9" UInt = LeadingDigit *Digit

Commands

All commands are case-insensitive, but most keywords are case-sensitive.

SETCONF

Change the value of one or more configuration variables. The syntax is:

"SETCONF" 1*(SP keyword ["=" value]) CRLF value = String / QuotedString

Tor behaves as though it had just read each of the key-value pairs from its configuration file. Keywords with no corresponding values have their configuration values reset to 0 or NULL (use RESETCONF if you want to set it back to its default). SETCONF is all-or-nothing: if there is an error in any of the configuration settings, Tor sets none of them.

Tor responds with a "250 OK" reply on success. If some of the listed keywords can't be found, Tor replies with a "552 Unrecognized option" message. Otherwise, Tor responds with a "513 syntax error in configuration values" reply on syntax error, or a "553 impossible configuration setting" reply on a semantic error.

Some configuration options (e.g. "Bridge") take multiple values. Also, some configuration keys (e.g. for hidden services and for entry guard lists) form a context-sensitive group where order matters (see GETCONF below). In these cases, setting any of the options in a SETCONF command is taken to reset all of the others. For example, if two ORListenAddress values are configured, and a SETCONF command arrives containing a single ORListenAddress value, the new command's value replaces the two old values.

Sometimes it is not possible to change configuration options solely by issuing a series of SETCONF commands, because the value of one of the configuration options depends on the value of another which has not yet been set. Such situations can be overcome by setting multiple configuration options with a single SETCONF command (e.g. SETCONF ORPort=443 ORListenAddress=9001).

RESETCONF

Remove all settings for a given configuration option entirely, assign its default value (if any), and then assign the String provided. Typically the String is left empty, to simply set an option back to its default. The syntax is:

"RESETCONF" 1*(SP keyword ["=" String]) CRLF

Otherwise it behaves like SETCONF above.

GETCONF

Request the value of zero or more configuration variable(s). The syntax is:

"GETCONF" *(SP keyword) CRLF

If all of the listed keywords exist in the Tor configuration, Tor replies with a series of reply lines of the form:

250 keyword=value

If any option is set to a 'default' value semantically different from an empty string, Tor may reply with a reply line of the form:

250 keyword

Value may be a raw value or a quoted string. Tor will try to use unquoted values except when the value could be misinterpreted through not being quoted. (Right now, Tor supports no such misinterpretable values for configuration options.)

If some of the listed keywords can't be found, Tor replies with a "552 unknown configuration keyword" message.

If an option appears multiple times in the configuration, all of its key-value pairs are returned in order.

If no keywords were provided, Tor responds with "250 OK" message.

Some options are context-sensitive, and depend on other options with different keywords. These cannot be fetched directly. Currently there is only one such option: clients should use the "HiddenServiceOptions" virtual keyword to get all HiddenServiceDir, HiddenServicePort, HiddenServiceVersion, and HiddenserviceAuthorizeClient option settings.

SETEVENTS

Request the server to inform the client about interesting events. The syntax is:

"SETEVENTS" [SP "EXTENDED"] *(SP EventCode) CRLF

EventCode = 1*(ALPHA / "_") (see section 4.1.x for event types)

Any events not listed in the SETEVENTS line are turned off; thus, sending SETEVENTS with an empty body turns off all event reporting.

The server responds with a "250 OK" reply on success, and a "552 Unrecognized event" reply if one of the event codes isn't recognized. (On error, the list of active event codes isn't changed.)

If the flag string "EXTENDED" is provided, Tor may provide extra information with events for this connection; see 4.1 for more information. NOTE: All events on a given connection will be provided in extended format, or none. NOTE: "EXTENDED" was first supported in Tor 0.1.1.9-alpha; it is always-on in Tor 0.2.2.1-alpha and later.

Each event is described in more detail in Section 4.1.

AUTHENTICATE

Sent from the client to the server. The syntax is:

"AUTHENTICATE" [ SP 1*HEXDIG / QuotedString ] CRLF

This command is used to authenticate to the server. The provided string is one of the following:

* (For the HASHEDPASSWORD authentication method; see 3.21) The original password represented as a QuotedString. * (For the COOKIE is authentication method; see 3.21) The contents of the cookie file, formatted in hexadecimal * (For the SAFECOOKIE authentication method; see 3.21) The HMAC based on the AUTHCHALLENGE message, in hexadecimal.

The server responds with "250 OK" on success or "515 Bad authentication" if the authentication cookie is incorrect. Tor closes the connection on an authentication failure.

The authentication token can be specified as either a quoted ASCII string, or as an unquoted hexadecimal encoding of that same string (to avoid escaping issues).

For information on how the implementation securely stores authentication information on disk, see section 5.1.

Before the client has authenticated, no command other than PROTOCOLINFO, AUTHCHALLENGE, AUTHENTICATE, or QUIT is valid. If the controller sends any other command, or sends a malformed command, or sends an unsuccessful AUTHENTICATE command, or sends PROTOCOLINFO or AUTHCHALLENGE more than once, Tor sends an error reply and closes the connection.

To prevent some cross-protocol attacks, the AUTHENTICATE command is still required even if all authentication methods in Tor are disabled. In this case, the controller should just send "AUTHENTICATE" CRLF.

(Versions of Tor before 0.1.2.16 and 0.2.0.4-alpha did not close the connection after an authentication failure.)

SAVECONF

Sent from the client to the server. The syntax is:

"SAVECONF" [SP "FORCE"] CRLF

Instructs the server to write out its config options into its torrc. Server returns "250 OK" if successful, or "551 Unable to write configuration to disk" if it can't write the file or some other error occurs.

If the %include option is used on torrc, SAVECONF will not write the configuration to disk. If the flag string "FORCE" is provided, the configuration will be overwritten even if %include is used. Using %include on defaults-torrc does not affect SAVECONF. (Introduced in 0.3.1.1-alpha.)

See also the "getinfo config-text" command, if the controller wants to write the torrc file itself.

See also the "getinfo config-can-saveconf" command, to tell if the FORCE flag will be required. (Also introduced in 0.3.1.1-alpha.)

SIGNAL

Sent from the client to the server. The syntax is:

"SIGNAL" SP Signal CRLF

Signal = "RELOAD" / "SHUTDOWN" / "DUMP" / "DEBUG" / "HALT" / "HUP" / "INT" / "USR1" / "USR2" / "TERM" / "NEWNYM" / "CLEARDNSCACHE" / "HEARTBEAT" / "ACTIVE" / "DORMANT" The meaning of the signals are: RELOAD -- Reload: reload config items. SHUTDOWN -- Controlled shutdown: if server is an OP, exit immediately. If it's an OR, close listeners and exit after ShutdownWaitLength seconds. DUMP -- Dump stats: log information about open connections and circuits. DEBUG -- Debug: switch all open logs to loglevel debug. HALT -- Immediate shutdown: clean up and exit now. CLEARDNSCACHE -- Forget the client-side cached IPs for all hostnames. NEWNYM -- Switch to clean circuits, so new application requests don't share any circuits with old ones. Also clears the client-side DNS cache. (Tor MAY rate-limit its response to this signal.) HEARTBEAT -- Make Tor dump an unscheduled Heartbeat message to log. DORMANT -- Tell Tor to become "dormant". A dormant Tor will try to avoid CPU and network usage until it receives user-initiated network request. (Don't use this on relays or hidden services yet!) ACTIVE -- Tell Tor to stop being "dormant", as if it had received a user-initiated network request.

The server responds with "250 OK" if the signal is recognized (or simply closes the socket if it was asked to close immediately), or "552 Unrecognized signal" if the signal is unrecognized.

Note that not all of these signals have POSIX signal equivalents. The ones that do are as below. You may also use these POSIX names for the signal that have them.

RELOAD: HUP SHUTDOWN: INT HALT: TERM DUMP: USR1 DEBUG: USR2 [SIGNAL DORMANT and SIGNAL ACTIVE were added in 0.4.0.1-alpha.]

MAPADDRESS

Sent from the client to the server. The syntax is:

"MAPADDRESS" 1*(Address "=" Address SP) CRLF

The first address in each pair is an "original" address; the second is a "replacement" address. The client sends this message to the server in order to tell it that future SOCKS requests for connections to the original address should be replaced with connections to the specified replacement address. If the addresses are well-formed, and the server is able to fulfill the request, the server replies with a 250 message:

250-OldAddress1=NewAddress1 250 OldAddress2=NewAddress2

containing the source and destination addresses. If request is malformed, the server replies with "512 syntax error in command argument". If the server can't fulfill the request, it replies with "451 resource exhausted".

The client may decline to provide a body for the original address, and instead send a special null address ("0.0.0.0" for IPv4, "::0" for IPv6, or "." for hostname), signifying that the server should choose the original address itself, and return that address in the reply. The server should ensure that it returns an element of address space that is unlikely to be in actual use. If there is already an address mapped to the destination address, the server may reuse that mapping.

If the original address is already mapped to a different address, the old mapping is removed. If the original address and the destination address are the same, the server removes any mapping in place for the original address.

Example:

C: MAPADDRESS 1.2.3.4=torproject.org S: 250 1.2.3.4=torproject.org C: GETINFO address-mappings/control S: 250-address-mappings/control=1.2.3.4 torproject.org NEVER S: 250 OK C: MAPADDRESS 1.2.3.4=1.2.3.4 S: 250 1.2.3.4=1.2.3.4 C: GETINFO address-mappings/control S: 250-address-mappings/control= S: 250 OK

{Note: This feature is designed to be used to help Tor-ify applications that need to use SOCKS4 or hostname-less SOCKS5. There are three approaches to doing this:

1. Somehow make them use SOCKS4a or SOCKS5-with-hostnames instead. 2. Use tor-resolve (or another interface to Tor's resolve-over-SOCKS feature) to resolve the hostname remotely. This doesn't work with special addresses like x.onion or x.y.exit. 3. Use MAPADDRESS to map an IP address to the desired hostname, and then arrange to fool the application into thinking that the hostname has resolved to that IP. This functionality is designed to help implement the 3rd approach.}

Mappings set by the controller last until the Tor process exits: they never expire. If the controller wants the mapping to last only a certain time, then it must explicitly un-map the address when that time has elapsed.

MapAddress replies MAY contain mixed status codes.

Example:

C: MAPADDRESS xxx=@@@ 0.0.0.0=bogus1.google.com S: 512-syntax error: invalid address '@@@' S: 250 127.199.80.246=bogus1.google.com

GETINFO

Sent from the client to the server. The syntax is as for GETCONF:

"GETINFO" 1*(SP keyword) CRLF

Unlike GETCONF, this message is used for data that are not stored in the Tor configuration file, and that may be longer than a single line. On success, one ReplyLine is sent for each requested value, followed by a final 250 OK ReplyLine. If a value fits on a single line, the format is:

250-keyword=value If a value must be split over multiple lines, the format is: 250+keyword= value . The server sends a 551 or 552 error on failure. Recognized keys and their values include: "version" -- The version of the server's software, which MAY include the name of the software, such as "Tor 0.0.9.4". The name of the software, if absent, is assumed to be "Tor". "config-file" -- The location of Tor's configuration file ("torrc"). "config-defaults-file" -- The location of Tor's configuration defaults file ("torrc.defaults"). This file gets parsed before torrc, and is typically used to replace Tor's default configuration values. [First implemented in 0.2.3.9-alpha.] "config-text" -- The contents that Tor would write if you send it a SAVECONF command, so the controller can write the file to disk itself. [First implemented in 0.2.2.7-alpha.] "exit-policy/default" -- The default exit policy lines that Tor will *append* to the ExitPolicy config option. "exit-policy/reject-private/default" -- The default exit policy lines that Tor will *prepend* to the ExitPolicy config option when ExitPolicyRejectPrivate is 1. "exit-policy/reject-private/relay" -- The relay-specific exit policy lines that Tor will *prepend* to the ExitPolicy config option based on the current values of ExitPolicyRejectPrivate and ExitPolicyRejectLocalInterfaces. These lines are based on the public addresses configured in the torrc and present on the relay's interfaces. Will send 552 error if the server is not running as onion router. Will send 551 on internal error which may be transient. "exit-policy/ipv4" "exit-policy/ipv6" "exit-policy/full" -- This OR's exit policy, in IPv4-only, IPv6-only, or all-entries flavors. Handles errors in the same way as "exit-policy/ reject-private/relay" does. "desc/id/<OR identity>" or "desc/name/<OR nickname>" -- the latest server descriptor for a given OR. (Note that modern Tor clients do not download server descriptors by default, but download microdescriptors instead. If microdescriptors are enabled, you'll need to use "md" instead.) "md/all" -- all known microdescriptors for the entire Tor network. Each microdescriptor is terminated by a newline. [First implemented in 0.3.5.1-alpha] "md/id/<OR identity>" or "md/name/<OR nickname>" -- the latest microdescriptor for a given OR. Empty if we have no microdescriptor for that OR (because we haven't downloaded one, or it isn't in the consensus). [First implemented in 0.2.3.8-alpha.] "desc/download-enabled" -- "1" if we try to download router descriptors; "0" otherwise. [First implemented in 0.3.2.1-alpha] "md/download-enabled" -- "1" if we try to download microdescriptors; "0" otherwise. [First implemented in 0.3.2.1-alpha] "dormant" -- A nonnegative integer: zero if Tor is currently active and building circuits, and nonzero if Tor has gone idle due to lack of use or some similar reason. [First implemented in 0.2.3.16-alpha] "desc-annotations/id/<OR identity>" -- outputs the annotations string (source, timestamp of arrival, purpose, etc) for the corresponding descriptor. [First implemented in 0.2.0.13-alpha.] "extra-info/digest/<digest>" -- the extrainfo document whose digest (in hex) is <digest>. Only available if we're downloading extra-info documents. "ns/id/<OR identity>" or "ns/name/<OR nickname>" -- the latest router status info (v3 directory style) for a given OR. Router status info is as given in dir-spec.txt, and reflects the latest consensus opinion about the router in question. Like directory clients, controllers MUST tolerate unrecognized flags and lines. The published date and descriptor digest are those believed to be best by this Tor, not necessarily those for a descriptor that Tor currently has. [First implemented in 0.1.2.3-alpha.] [In 0.2.0.9-alpha this switched from v2 directory style to v3] "ns/all" -- Router status info (v3 directory style) for all ORs we that the consensus has an opinion about, joined by newlines. [First implemented in 0.1.2.3-alpha.] [In 0.2.0.9-alpha this switched from v2 directory style to v3] "ns/purpose/<purpose>" -- Router status info (v3 directory style) for all ORs of this purpose. Mostly designed for /ns/purpose/bridge queries. [First implemented in 0.2.0.13-alpha.] [In 0.2.0.9-alpha this switched from v2 directory style to v3] [In versions before 0.4.1.1-alpha we set the Running flag on bridges when /ns/purpose/bridge is accessed] [In 0.4.1.1-alpha we set the Running flag on bridges when the bridge networkstatus file is written to disk] "desc/all-recent" -- the latest server descriptor for every router that Tor knows about. (See md note about "desc/id" and "desc/name" above.) "network-status" -- [Deprecated in 0.3.1.1-alpha, removed in 0.4.5.1-alpha.] "address-mappings/all" "address-mappings/config" "address-mappings/cache" "address-mappings/control" -- a \r\n-separated list of address mappings, each in the form of "from-address to-address expiry". The 'config' key returns those address mappings set in the configuration; the 'cache' key returns the mappings in the client-side DNS cache; the 'control' key returns the mappings set via the control interface; the 'all' target returns the mappings set through any mechanism. Expiry is formatted as with ADDRMAP events, except that "expiry" is always a time in UTC or the string "NEVER"; see section 4.1.7. First introduced in 0.2.0.3-alpha. "addr-mappings/*" -- as for address-mappings/*, but without the expiry portion of the value. Use of this value is deprecated since 0.2.0.3-alpha; use address-mappings instead. "address" -- the best guess at our external IP address. If we have no guess, return a 551 error. (Added in 0.1.2.2-alpha) "address/v4" "address/v6" the best guess at our respective external IPv4 or IPv6 address. If we have no guess, return a 551 error. (Added in 0.4.5.1-alpha) "fingerprint" -- the contents of the fingerprint file that Tor writes as a relay, or a 551 if we're not a relay currently. (Added in 0.1.2.3-alpha) "circuit-status" A series of lines as for a circuit status event. Each line is of the form described in section 4.1.1, omitting the initial "650 CIRC ". Note that clients must be ready to accept additional arguments as described in section 4.1. "stream-status" A series of lines as for a stream status event. Each is of the form: StreamID SP StreamStatus SP CircuitID SP Target CRLF "orconn-status" A series of lines as for an OR connection status event. In Tor 0.1.2.2-alpha with feature VERBOSE_NAMES enabled and in Tor 0.2.2.1-alpha and later by default, each line is of the form: LongName SP ORStatus CRLF In Tor versions 0.1.2.2-alpha through 0.2.2.1-alpha with feature VERBOSE_NAMES turned off and before version 0.1.2.2-alpha, each line is of the form: ServerID SP ORStatus CRLF "entry-guards" A series of lines listing the currently chosen entry guards, if any. In Tor 0.1.2.2-alpha with feature VERBOSE_NAMES enabled and in Tor 0.2.2.1-alpha and later by default, each line is of the form: LongName SP Status [SP ISOTime] CRLF In Tor versions 0.1.2.2-alpha through 0.2.2.1-alpha with feature VERBOSE_NAMES turned off and before version 0.1.2.2-alpha, each line is of the form: ServerID2 SP Status [SP ISOTime] CRLF ServerID2 = Nickname / 40*HEXDIG The definition of Status is the same for both: Status = "up" / "never-connected" / "down" / "unusable" / "unlisted" [From 0.1.1.4-alpha to 0.1.1.10-alpha, entry-guards was called "helper-nodes". Tor still supports calling "helper-nodes", but it is deprecated and should not be used.] [Older versions of Tor (before 0.1.2.x-final) generated 'down' instead of unlisted/unusable. Between 0.1.2.x-final and 0.2.6.3-alpha, 'down' was never generated.] [XXXX ServerID2 differs from ServerID in not prefixing fingerprints with a $. This is an implementation error. It would be nice to add the $ back in if we can do so without breaking compatibility.] "traffic/read" -- Total bytes read (downloaded). "traffic/written" -- Total bytes written (uploaded). "uptime" -- Uptime of the Tor daemon (in seconds). Added in 0.3.5.1-alpha. "accounting/enabled" "accounting/hibernating" "accounting/bytes" "accounting/bytes-left" "accounting/interval-start" "accounting/interval-wake" "accounting/interval-end" Information about accounting status. If accounting is enabled, "enabled" is 1; otherwise it is 0. The "hibernating" field is "hard" if we are accepting no data; "soft" if we're accepting no new connections, and "awake" if we're not hibernating at all. The "bytes" and "bytes-left" fields contain (read-bytes SP write-bytes), for the start and the rest of the interval respectively. The 'interval-start' and 'interval-end' fields are the borders of the current interval; the 'interval-wake' field is the time within the current interval (if any) where we plan[ned] to start being active. The times are UTC. "config/names" A series of lines listing the available configuration options. Each is of the form: OptionName SP OptionType [ SP Documentation ] CRLF OptionName = Keyword OptionType = "Integer" / "TimeInterval" / "TimeMsecInterval" / "DataSize" / "Float" / "Boolean" / "Time" / "CommaList" / "Dependent" / "Virtual" / "String" / "LineList" Documentation = Text Note: The incorrect spelling "Dependant" was used from the time this key was introduced in Tor 0.1.1.4-alpha until it was corrected in Tor 0.3.0.2-alpha. It is recommended that clients accept both spellings. "config/defaults" A series of lines listing default values for each configuration option. Options which don't have a valid default don't show up in the list. Introduced in Tor 0.2.4.1-alpha. OptionName SP OptionValue CRLF OptionName = Keyword OptionValue = Text "info/names" A series of lines listing the available GETINFO options. Each is of one of these forms: OptionName SP Documentation CRLF OptionPrefix SP Documentation CRLF OptionPrefix = OptionName "/*" The OptionPrefix form indicates a number of options beginning with the prefix. So if "config/*" is listed, other options beginning with "config/" will work, but "config/*" itself is not an option. "events/names" A space-separated list of all the events supported by this version of Tor's SETEVENTS. "features/names" A space-separated list of all the features supported by this version of Tor's USEFEATURE. "signal/names" A space-separated list of all the values supported by the SIGNAL command. "ip-to-country/ipv4-available" "ip-to-country/ipv6-available" "1" if the relevant geoip or geoip6 database is present; "0" otherwise. This field was added in Tor 0.3.2.1-alpha. "ip-to-country/*" Maps IP addresses to 2-letter country codes. For example, "GETINFO ip-to-country/18.0.0.1" should give "US". "process/pid" -- Process id belonging to the main tor process. "process/uid" -- User id running the tor process, -1 if unknown (this is unimplemented on Windows, returning -1). "process/user" -- Username under which the tor process is running, providing an empty string if none exists (this is unimplemented on Windows, returning an empty string). "process/descriptor-limit" -- Upper bound on the file descriptor limit, -1 if unknown "dir/status-vote/current/consensus" [added in Tor 0.2.1.6-alpha] "dir/status-vote/current/consensus-microdesc" [added in Tor 0.4.3.1-alpha] "dir/status/authority" "dir/status/fp/<F>" "dir/status/fp/<F1>+<F2>+<F3>" "dir/status/all" "dir/server/fp/<F>" "dir/server/fp/<F1>+<F2>+<F3>" "dir/server/d/<D>" "dir/server/d/<D1>+<D2>+<D3>" "dir/server/authority" "dir/server/all" A series of lines listing directory contents, provided according to the specification for the URLs listed in Section 4.4 of dir-spec.txt. Note that Tor MUST NOT provide private information, such as descriptors for routers not marked as general-purpose. When asked for 'authority' information for which this Tor is not authoritative, Tor replies with an empty string. Note that, as of Tor 0.2.3.3-alpha, Tor clients don't download server descriptors anymore, but microdescriptors. So, a "551 Servers unavailable" reply to all "GETINFO dir/server/*" requests is actually correct. If you have an old program which absolutely requires server descriptors to work, try setting UseMicrodescriptors 0 or FetchUselessDescriptors 1 in your client's torrc. "status/circuit-established" "status/enough-dir-info" "status/good-server-descriptor" "status/accepted-server-descriptor" "status/..." These provide the current internal Tor values for various Tor states. See Section 4.1.10 for explanations. (Only a few of the status events are available as getinfo's currently. Let us know if you want more exposed.) "status/reachability-succeeded/or" 0 or 1, depending on whether we've found our ORPort reachable. "status/reachability-succeeded/dir" 0 or 1, depending on whether we've found our DirPort reachable. 1 if there is no DirPort, and therefore no need for a reachability check. "status/reachability-succeeded" "OR=" ("0"/"1") SP "DIR=" ("0"/"1") Combines status/reachability-succeeded/*; controllers MUST ignore unrecognized elements in this entry. "status/bootstrap-phase" Returns the most recent bootstrap phase status event sent. Specifically, it returns a string starting with either "NOTICE BOOTSTRAP ..." or "WARN BOOTSTRAP ...". Controllers should use this getinfo when they connect or attach to Tor to learn its current bootstrap state. "status/version/recommended" List of currently recommended versions. "status/version/current" Status of the current version. One of: new, old, unrecommended, recommended, new in series, obsolete, unknown. "status/clients-seen" A summary of which countries we've seen clients from recently, formatted the same as the CLIENTS_SEEN status event described in Section 4.1.14. This GETINFO option is currently available only for bridge relays. "status/fresh-relay-descs" Provides fresh server and extra-info descriptors for our relay. Note this is *not* the latest descriptors we've published, but rather what we would generate if we needed to make a new descriptor right now. "net/listeners/*" A quoted, space-separated list of the locations where Tor is listening for connections of the specified type. These can contain IPv4 network address... "127.0.0.1:9050" "127.0.0.1:9051" ... or local unix sockets... "unix:/home/my_user/.tor/socket" ... or IPv6 network addresses: "[2001:0db8:7000:0000:0000:dead:beef:1234]:9050" [New in Tor 0.2.2.26-beta.] "net/listeners/or" Listeners for OR connections. Talks Tor protocol as described in tor-spec.txt. "net/listeners/dir" Listeners for Tor directory protocol, as described in dir-spec.txt. "net/listeners/socks" Listeners for onion proxy connections that talk SOCKS4/4a/5 protocol. "net/listeners/trans" Listeners for transparent connections redirected by firewall, such as pf or netfilter. "net/listeners/natd" Listeners for transparent connections redirected by natd. "net/listeners/dns" Listeners for a subset of DNS protocol that Tor network supports. "net/listeners/control" Listeners for Tor control protocol, described herein. "net/listeners/extor" Listeners corresponding to Extended ORPorts for integration with pluggable transports. See proposals 180 and 196. "net/listeners/httptunnel" Listeners for onion proxy connections that leverage HTTP CONNECT tunnelling. [The extor and httptunnel lists were added in 0.3.2.12, 0.3.3.10, and 0.3.4.6-rc.] "dir-usage" A newline-separated list of how many bytes we've served to answer each type of directory request. The format of each line is: Keyword 1*SP Integer 1*SP Integer where the first integer is the number of bytes written, and the second is the number of requests answered. [This feature was added in Tor 0.2.2.1-alpha, and removed in Tor 0.2.9.1-alpha. Even when it existed, it only provided useful output when the Tor client was built with either the INSTRUMENT_DOWNLOADS or RUNNING_DOXYGEN compile-time options.] "bw-event-cache" A space-separated summary of recent BW events in chronological order from oldest to newest. Each event is represented by a comma-separated tuple of "R,W", R is the number of bytes read, and W is the number of bytes written. These entries each represent about one second's worth of traffic. [New in Tor 0.2.6.3-alpha] "consensus/valid-after" "consensus/fresh-until" "consensus/valid-until" Each of these produces an ISOTime describing part of the lifetime of the current (valid, accepted) consensus that Tor has. [New in Tor 0.2.6.3-alpha] "hs/client/desc/id/<ADDR>" Prints the content of the hidden service descriptor corresponding to the given <ADDR> which is an onion address without the ".onion" part. The client's cache is queried to find the descriptor. The format of the descriptor is described in section 1.3 of the rend-spec.txt document. If <ADDR> is unrecognized or if not found in the cache, a 551 error is returned. [New in Tor 0.2.7.1-alpha] [HS v3 support added 0.3.3.1-alpha] "hs/service/desc/id/<ADDR>" Prints the content of the hidden service descriptor corresponding to the given <ADDR> which is an onion address without the ".onion" part. The service's local descriptor cache is queried to find the descriptor. The format of the descriptor is described in section 1.3 of the rend-spec.txt document. If <ADDR> is unrecognized or if not found in the cache, a 551 error is returned. [New in Tor 0.2.7.2-alpha] [HS v3 support added 0.3.3.1-alpha] "onions/current" "onions/detached" A newline-separated list of the Onion ("Hidden") Services created via the "ADD_ONION" command. The 'current' key returns Onion Services belonging to the current control connection. The 'detached' key returns Onion Services detached from the parent control connection (as in, belonging to no control connection). The format of each line is: HSAddress [New in Tor 0.2.7.1-alpha.] [HS v3 support added 0.3.3.1-alpha] "network-liveness" The string "up" or "down", indicating whether we currently believe the network is reachable. "downloads/" The keys under downloads/ are used to query download statuses; they all return either a sequence of newline-terminated hex encoded digests, or a "serialized download status" as follows: SerializedDownloadStatus = -- when do we plan to next attempt to download this object? "next-attempt-at" SP ISOTime CRLF -- how many times have we failed since the last success? "n-download-failures" SP UInt CRLF -- how many times have we tried to download this? "n-download-attempts" SP UInt CRLF -- according to which schedule rule will we download this? "schedule" SP DownloadSchedule CRLF -- do we want to fetch this from an authority, or will any cache do? "want-authority" SP DownloadWantAuthority CRLF -- do we increase our download delay whenever we fail to fetch this, -- or whenever we attempt fetching this? "increment-on" SP DownloadIncrementOn CRLF -- do we increase the download schedule deterministically, or at -- random? "backoff" SP DownloadBackoff CRLF [ -- with an exponential backoff, where are we in the schedule? "last-backoff-position" Uint CRLF -- with an exponential backoff, what was our last delay? "last-delay-used UInt CRLF ] where DownloadSchedule = "DL_SCHED_GENERIC" / "DL_SCHED_CONSENSUS" / "DL_SCHED_BRIDGE" DownloadWantAuthority = "DL_WANT_ANY_DIRSERVER" / "DL_WANT_AUTHORITY" DownloadIncrementOn = "DL_SCHED_INCREMENT_FAILURE" / "DL_SCHED_INCREMENT_ATTEMPT" DownloadBackoff = "DL_SCHED_DETERMINISTIC" / "DL_SCHED_RANDOM_EXPONENTIAL" The optional last two lines must be present if DownloadBackoff is "DL_SCHED_RANDOM_EXPONENTIAL" and must be absent if DownloadBackoff is "DL_SCHED_DETERMINISTIC". In detail, the keys supported are: "downloads/networkstatus/ns" The SerializedDownloadStatus for the NS-flavored consensus for whichever bootstrap state Tor is currently in. "downloads/networkstatus/ns/bootstrap" The SerializedDownloadStatus for the NS-flavored consensus at bootstrap time, regardless of whether we are currently bootstrapping. "downloads/networkstatus/ns/running" The SerializedDownloadStatus for the NS-flavored consensus when running, regardless of whether we are currently bootstrapping. "downloads/networkstatus/microdesc" The SerializedDownloadStatus for the microdesc-flavored consensus for whichever bootstrap state Tor is currently in. "downloads/networkstatus/microdesc/bootstrap" The SerializedDownloadStatus for the microdesc-flavored consensus at bootstrap time, regardless of whether we are currently bootstrapping. "downloads/networkstatus/microdesc/running" The SerializedDownloadStatus for the microdesc-flavored consensus when running, regardless of whether we are currently bootstrapping. "downloads/cert/fps" A newline-separated list of hex-encoded digests for authority certificates for which we have download status available. "downloads/cert/fp/<Fingerprint>" A SerializedDownloadStatus for the default certificate for the identity digest <Fingerprint> returned by the downloads/cert/fps key. "downloads/cert/fp/<Fingerprint>/sks" A newline-separated list of hex-encoded signing key digests for the authority identity digest <Fingerprint> returned by the downloads/cert/fps key. "downloads/cert/fp/<Fingerprint>/<SKDigest>" A SerializedDownloadStatus for the certificate for the identity digest <Fingerprint> returned by the downloads/cert/fps key and signing key digest <SKDigest> returned by the downloads/cert/fp/<Fingerprint>/ sks key. "downloads/desc/descs" A newline-separated list of hex-encoded router descriptor digests [note, not identity digests - the Tor process may not have seen them yet while downloading router descriptors]. If the Tor process is not using a NS-flavored consensus, a 551 error is returned. "downloads/desc/<Digest>" A SerializedDownloadStatus for the router descriptor with digest <Digest> as returned by the downloads/desc/descs key. If the Tor process is not using a NS-flavored consensus, a 551 error is returned. "downloads/bridge/bridges" A newline-separated list of hex-encoded bridge identity digests. If the Tor process is not using bridges, a 551 error is returned. "downloads/bridge/<Digest>" A SerializedDownloadStatus for the bridge descriptor with identity digest <Digest> as returned by the downloads/bridge/bridges key. If the Tor process is not using bridges, a 551 error is returned. "sr/current" "sr/previous" The current or previous shared random value, as received in the consensus, base-64 encoded. An empty value means that either the consensus has no shared random value, or Tor has no consensus. "current-time/local" "current-time/utc" The current system or UTC time, as returned by the system, in ISOTime2 format. (Introduced in 0.3.4.1-alpha.) "stats/ntor/requested" "stats/ntor/assigned" The NTor circuit onion handshake rephist values which are requested or assigned. (Introduced in 0.4.5.1-alpha) "stats/tap/requested" "stats/tap/assigned" The TAP circuit onion handshake rephist values which are requested or assigned. (Introduced in 0.4.5.1-alpha) "config-can-saveconf" 0 or 1, depending on whether it is possible to use SAVECONF without the FORCE flag. (Introduced in 0.3.1.1-alpha.) "limits/max-mem-in-queues" The amount of memory that Tor's out-of-memory checker will allow Tor to allocate (in places it can see) before it starts freeing memory and killing circuits. See the MaxMemInQueues option for more details. Unlike the option, this value reflects Tor's actual limit, and may be adjusted depending on the available system memory rather than on the MaxMemInQueues option. (Introduced in 0.2.5.4-alpha) Examples: C: GETINFO version desc/name/moria1 S: 250+desc/name/moria= S: [Descriptor for moria] S: . S: 250-version=Tor 0.1.1.0-alpha-cvs S: 250 OK

EXTENDCIRCUIT

Sent from the client to the server. The format is:

"EXTENDCIRCUIT" SP CircuitID [SP ServerSpec *("," ServerSpec)] [SP "purpose=" Purpose] CRLF

This request takes one of two forms: either the CircuitID is zero, in which case it is a request for the server to build a new circuit, or the CircuitID is nonzero, in which case it is a request for the server to extend an existing circuit with that ID according to the specified path.

If the CircuitID is 0, the controller has the option of providing a path for Tor to use to build the circuit. If it does not provide a path, Tor will select one automatically from high capacity nodes according to path-spec.txt.

If CircuitID is 0 and "purpose=" is specified, then the circuit's purpose is set. Two choices are recognized: "general" and "controller". If not specified, circuits are created as "general".

If the request is successful, the server sends a reply containing a message body consisting of the CircuitID of the (maybe newly created) circuit. The syntax is "250" SP "EXTENDED" SP CircuitID CRLF.

SETCIRCUITPURPOSE

Sent from the client to the server. The format is:

"SETCIRCUITPURPOSE" SP CircuitID SP "purpose=" Purpose CRLF

This changes the circuit's purpose. See EXTENDCIRCUIT above for details.

SETROUTERPURPOSE

Sent from the client to the server. The format is:

"SETROUTERPURPOSE" SP NicknameOrKey SP Purpose CRLF

This changes the descriptor's purpose. See +POSTDESCRIPTOR below for details.

NOTE: This command was disabled and made obsolete as of Tor 0.2.0.8-alpha. It doesn't exist anymore, and is listed here only for historical interest.

ATTACHSTREAM

Sent from the client to the server. The syntax is:

"ATTACHSTREAM" SP StreamID SP CircuitID [SP "HOP=" HopNum] CRLF

This message informs the server that the specified stream should be associated with the specified circuit. Each stream may be associated with at most one circuit, and multiple streams may share the same circuit. Streams can only be attached to completed circuits (that is, circuits that have sent a circuit status 'BUILT' event or are listed as built in a GETINFO circuit-status request).

If the circuit ID is 0, responsibility for attaching the given stream is returned to Tor.

If HOP=HopNum is specified, Tor will choose the HopNumth hop in the circuit as the exit node, rather than the last node in the circuit. Hops are 1-indexed; generally, it is not permitted to attach to hop 1.

Tor responds with "250 OK" if it can attach the stream, 552 if the circuit or stream didn't exist, 555 if the stream isn't in an appropriate state to be attached (e.g. it's already open), or 551 if the stream couldn't be attached for another reason.

{Implementation note: Tor will close unattached streams by itself, roughly two minutes after they are born. Let the developers know if that turns out to be a problem.}

{Implementation note: By default, Tor automatically attaches streams to circuits itself, unless the configuration variable "__LeaveStreamsUnattached" is set to "1". Attempting to attach streams via TC when "__LeaveStreamsUnattached" is false may cause a race between Tor and the controller, as both attempt to attach streams to circuits.}

{Implementation note: You can try to attachstream to a stream that has already sent a connect or resolve request but hasn't succeeded yet, in which case Tor will detach the stream from its current circuit before proceeding with the new attach request.}

POSTDESCRIPTOR

Sent from the client to the server. The syntax is:

"+POSTDESCRIPTOR" [SP "purpose=" Purpose] [SP "cache=" Cache] CRLF Descriptor CRLF "." CRLF

This message informs the server about a new descriptor. If Purpose is specified, it must be either "general", "controller", or "bridge", else we return a 552 error. The default is "general".

If Cache is specified, it must be either "no" or "yes", else we return a 552 error. If Cache is not specified, Tor will decide for itself whether it wants to cache the descriptor, and controllers must not rely on its choice.

The descriptor, when parsed, must contain a number of well-specified fields, including fields for its nickname and identity.

If there is an error in parsing the descriptor, the server must send a "554 Invalid descriptor" reply. If the descriptor is well-formed but the server chooses not to add it, it must reply with a 251 message whose body explains why the server was not added. If the descriptor is added, Tor replies with "250 OK".

REDIRECTSTREAM

Sent from the client to the server. The syntax is:

"REDIRECTSTREAM" SP StreamID SP Address [SP Port] CRLF

Tells the server to change the exit address on the specified stream. If Port is specified, changes the destination port as well. No remapping is performed on the new provided address.

To be sure that the modified address will be used, this event must be sent after a new stream event is received, and before attaching this stream to a circuit.

Tor replies with "250 OK" on success.

CLOSESTREAM

Sent from the client to the server. The syntax is:

"CLOSESTREAM" SP StreamID SP Reason *(SP Flag) CRLF

Tells the server to close the specified stream. The reason should be one of the Tor RELAY_END reasons given in tor-spec.txt, as a decimal. Flags is not used currently; Tor servers SHOULD ignore unrecognized flags. Tor may hold the stream open for a while to flush any data that is pending.

Tor replies with "250 OK" on success, or a 512 if there aren't enough arguments, or a 552 if it doesn't recognize the StreamID or reason.

CLOSECIRCUIT

The syntax is:

"CLOSECIRCUIT" SP CircuitID *(SP Flag) CRLF Flag = "IfUnused"

Tells the server to close the specified circuit. If "IfUnused" is provided, do not close the circuit unless it is unused.

Other flags may be defined in the future; Tor SHOULD ignore unrecognized flags.

Tor replies with "250 OK" on success, or a 512 if there aren't enough arguments, or a 552 if it doesn't recognize the CircuitID.

QUIT

Tells the server to hang up on this controller connection. This command can be used before authenticating.

USEFEATURE

Adding additional features to the control protocol sometimes will break backwards compatibility. Initially such features are added into Tor and disabled by default. USEFEATURE can enable these additional features.

The syntax is:

"USEFEATURE" *(SP FeatureName) CRLF FeatureName = 1*(ALPHA / DIGIT / "_" / "-") Feature names are case-insensitive.

Once enabled, a feature stays enabled for the duration of the connection to the controller. A new connection to the controller must be opened to disable an enabled feature.

Features are a forward-compatibility mechanism; each feature will eventually become a standard part of the control protocol. Once a feature becomes part of the protocol, it is always-on. Each feature documents the version it was introduced as a feature and the version in which it became part of the protocol.

Tor will ignore a request to use any feature that is always-on. Tor will give a 552 error in response to an unrecognized feature.

EXTENDED_EVENTS

Same as passing 'EXTENDED' to SETEVENTS; this is the preferred way to request the extended event syntax. This feature was first introduced in 0.1.2.3-alpha. It is always-on and part of the protocol in Tor 0.2.2.1-alpha and later. VERBOSE_NAMES Replaces ServerID with LongName in events and GETINFO results. LongName provides a Fingerprint for all routers, an indication of Named status, and a Nickname if one is known. LongName is strictly more informative than ServerID, which only provides either a Fingerprint or a Nickname. This feature was first introduced in 0.1.2.2-alpha. It is always-on and part of the protocol in Tor 0.2.2.1-alpha and later.

RESOLVE

The syntax is

"RESOLVE" *Option *Address CRLF Option = "mode=reverse" Address = a hostname or IPv4 address

This command launches a remote hostname lookup request for every specified request (or reverse lookup if "mode=reverse" is specified). Note that the request is done in the background: to see the answers, your controller will need to listen for ADDRMAP events; see 4.1.7 below.

[Added in Tor 0.2.0.3-alpha]

PROTOCOLINFO

The syntax is:

"PROTOCOLINFO" *(SP PIVERSION) CRLF

The server reply format is:

"250-PROTOCOLINFO" SP PIVERSION CRLF *InfoLine "250 OK" CRLF

InfoLine = AuthLine / VersionLine / OtherLine

AuthLine = "250-AUTH" SP "METHODS=" AuthMethod *("," AuthMethod) *(SP "COOKIEFILE=" AuthCookieFile) CRLF VersionLine = "250-VERSION" SP "Tor=" TorVersion OptArguments CRLF AuthMethod = "NULL" / ; No authentication is required "HASHEDPASSWORD" / ; A controller must supply the original password "COOKIE" / ; ... or supply the contents of a cookie file "SAFECOOKIE" ; ... or prove knowledge of a cookie file's contents AuthCookieFile = QuotedString TorVersion = QuotedString OtherLine = "250-" Keyword OptArguments CRLF PIVERSION: 1*DIGIT

This command tells the controller what kinds of authentication are supported.

Tor MAY give its InfoLines in any order; controllers MUST ignore InfoLines with keywords they do not recognize. Controllers MUST ignore extraneous data on any InfoLine.

PIVERSION is there in case we drastically change the syntax one day. For now it should always be "1". Controllers MAY provide a list of the protocolinfo versions they support; Tor MAY select a version that the controller does not support.

AuthMethod is used to specify one or more control authentication methods that Tor currently accepts.

AuthCookieFile specifies the absolute path and filename of the authentication cookie that Tor is expecting and is provided iff the METHODS field contains the method "COOKIE" and/or "SAFECOOKIE". Controllers MUST handle escape sequences inside this string.

All authentication cookies are 32 bytes long. Controllers MUST NOT use the contents of a non-32-byte-long file as an authentication cookie.

If the METHODS field contains the method "SAFECOOKIE", every AuthCookieFile must contain the same authentication cookie.

The COOKIE authentication method exposes the user running a controller to an unintended information disclosure attack whenever the controller has greater filesystem read access than the process that it has connected to. (Note that a controller may connect to a process other than Tor.) It is almost never safe to use, even if the controller's user has explicitly specified which filename to read an authentication cookie from. For this reason, the COOKIE authentication method has been deprecated and will be removed from a future version of Tor.

The VERSION line contains the Tor version.

[Unlike other commands besides AUTHENTICATE, PROTOCOLINFO may be used (but only once!) before AUTHENTICATE.]

[PROTOCOLINFO was not supported before Tor 0.2.0.5-alpha.]

LOADCONF

The syntax is:

"+LOADCONF" CRLF ConfigText CRLF "." CRLF

This command allows a controller to upload the text of a config file to Tor over the control port. This config file is then loaded as if it had been read from disk.

[LOADCONF was added in Tor 0.2.1.1-alpha.]

TAKEOWNERSHIP

The syntax is:

"TAKEOWNERSHIP" CRLF

This command instructs Tor to shut down when this control connection is closed. This command affects each control connection that sends it independently; if multiple control connections send the TAKEOWNERSHIP command to a Tor instance, Tor will shut down when any of those connections closes.

(As of Tor 0.2.5.2-alpha, Tor does not wait a while for circuits to close when shutting down because of an exiting controller. If you want to ensure a clean shutdown--and you should!--then send "SIGNAL SHUTDOWN" and wait for the Tor process to close.)

This command is intended to be used with the __OwningControllerProcess configuration option. A controller that starts a Tor process which the user cannot easily control or stop should 'own' that Tor process:

* When starting Tor, the controller should specify its PID in an __OwningControllerProcess on Tor's command line. This will cause Tor to poll for the existence of a process with that PID, and exit if it does not find such a process. (This is not a completely reliable way to detect whether the 'owning controller' is still running, but it should work well enough in most cases.) * Once the controller has connected to Tor's control port, it should send the TAKEOWNERSHIP command along its control connection. At this point, *both* the TAKEOWNERSHIP command and the __OwningControllerProcess option are in effect: Tor will exit when the control connection ends *and* Tor will exit if it detects that there is no process with the PID specified in the __OwningControllerProcess option. * After the controller has sent the TAKEOWNERSHIP command, it should send "RESETCONF __OwningControllerProcess" along its control connection. This will cause Tor to stop polling for the existence of a process with its owning controller's PID; Tor will still exit when the control connection ends. [TAKEOWNERSHIP was added in Tor 0.2.2.28-beta.]

AUTHCHALLENGE

The syntax is:

"AUTHCHALLENGE" SP "SAFECOOKIE" SP ClientNonce CRLF ClientNonce = 2*HEXDIG / QuotedString

This command is used to begin the authentication routine for the SAFECOOKIE method of authentication.

If the server accepts the command, the server reply format is:

"250 AUTHCHALLENGE" SP "SERVERHASH=" ServerHash SP "SERVERNONCE=" ServerNonce CRLF ServerHash = 64*64HEXDIG ServerNonce = 64*64HEXDIG

The ClientNonce, ServerHash, and ServerNonce values are encoded/decoded in the same way as the argument passed to the AUTHENTICATE command. ServerNonce MUST be 32 bytes long.

ServerHash is computed as:

HMAC-SHA256("Tor safe cookie authentication server-to-controller hash", CookieString | ClientNonce | ServerNonce) (with the HMAC key as its first argument)

After a controller sends a successful AUTHCHALLENGE command, the next command sent on the connection must be an AUTHENTICATE command, and the only authentication string which that AUTHENTICATE command will accept is:

HMAC-SHA256("Tor safe cookie authentication controller-to-server hash", CookieString | ClientNonce | ServerNonce)

[Unlike other commands besides AUTHENTICATE, AUTHCHALLENGE may be used (but only once!) before AUTHENTICATE.]

[AUTHCHALLENGE was added in Tor 0.2.3.13-alpha.]

DROPGUARDS

The syntax is:

"DROPGUARDS" CRLF

Tells the server to drop all guard nodes. Do not invoke this command lightly; it can increase vulnerability to tracking attacks over time.

Tor replies with "250 OK" on success.

[DROPGUARDS was added in Tor 0.2.5.2-alpha.]

HSFETCH

The syntax is:

"HSFETCH" SP (HSAddress / "v" Version "-" DescId) *[SP "SERVER=" Server] CRLF HSAddress = 16*Base32Character / 56*Base32Character Version = "2" / "3" DescId = 32*Base32Character Server = LongName

This command launches hidden service descriptor fetch(es) for the given HSAddress or DescId.

HSAddress can be version 2 or version 3 addresses. DescIDs can only be version 2 IDs. Version 2 addresses consist of 16Base32Character and version 3 addresses consist of 56Base32Character.

If a DescId is specified, at least one Server MUST also be provided, otherwise a 512 error is returned. If no DescId and Server(s) are specified, it behaves like a normal Tor client descriptor fetch. If one or more Server are given, they are used instead triggering a fetch on each of them in parallel.

The caching behavior when fetching a descriptor using this command is identical to normal Tor client behavior.

Details on how to compute a descriptor id (DescId) can be found in rend-spec.txt section 1.3.

If any values are unrecognized, a 513 error is returned and the command is stopped. On success, Tor replies "250 OK" then Tor MUST eventually follow this with both a HS_DESC and HS_DESC_CONTENT events with the results. If SERVER is specified then events are emitted for each location.

Examples are:

C: HSFETCH v2-gezdgnbvgy3tqolbmjrwizlgm5ugs2tl SERVER=9695DFC35FFEB861329B9F1AB04C46397020CE31 S: 250 OK C: HSFETCH ajkhdsfuygaesfaa S: 250 OK C: HSFETCH vww6ybal4bd7szmgncyruucpgfkqahzddi37ktceo3ah7ngmcopnpyyd S: 250 OK

[HSFETCH was added in Tor 0.2.7.1-alpha] [HS v3 support added 0.4.1.1-alpha]

ADD_ONION

The syntax is:

"ADD_ONION" SP KeyType ":" KeyBlob [SP "Flags=" Flag *("," Flag)] [SP "MaxStreams=" NumStreams] 1*(SP "Port=" VirtPort ["," Target]) *(SP "ClientAuth=" ClientName [":" ClientBlob]) CRLF *(SP "ClientAuthV3=" V3Key) CRLF KeyType = "NEW" / ; The server should generate a key of algorithm KeyBlob "RSA1024" / ; The server should use the 1024 bit RSA key provided in as KeyBlob (v2). "ED25519-V3"; The server should use the ed25519 v3 key provided in as KeyBlob (v3). KeyBlob = "BEST" / ; The server should generate a key using the "best" supported algorithm (KeyType == "NEW"). [As of 0.4.2.3-alpha, ED25519-V3 is used] "RSA1024" / ; The server should generate a 1024 bit RSA key (KeyType == "NEW") (v2). "ED25519-V3"; The server should generate an ed25519 private key (KeyType == "NEW") (v3). String ; A serialized private key (without whitespace) Flag = "DiscardPK" / ; The server should not include the newly generated private key as part of the response. "Detach" / ; Do not associate the newly created Onion Service to the current control connection. "BasicAuth" / ; Client authorization is required using the "basic" method (v2 only). "V3Auth" / ; Version 3 client authorization is required (v3 only). "NonAnonymous" /; Add a non-anonymous Single Onion Service. Tor checks this flag matches its configured hidden service anonymity mode. "MaxStreamsCloseCircuit"; Close the circuit is the maximum streams allowed is reached. NumStreams = A value between 0 and 65535 which is used as the maximum streams that can be attached on a rendezvous circuit. Setting it to 0 means unlimited which is also the default behavior. VirtPort = The virtual TCP Port for the Onion Service (As in the HiddenServicePort "VIRTPORT" argument). Target = The (optional) target for the given VirtPort (As in the optional HiddenServicePort "TARGET" argument). ClientName = An identifier 1 to 16 characters long, using only characters in A-Za-z0-9+-_ (no spaces) (v2 only). ClientBlob = Authorization data for the client, in an opaque format specific to the authorization method (v2 only). V3Key = The client's base32-encoded x25519 public key, using only the key part of rend-spec-v3.txt section G.1.2 (v3 only). The server reply format is: "250-ServiceID=" ServiceID CRLF ["250-PrivateKey=" KeyType ":" KeyBlob CRLF] *("250-ClientAuth=" ClientName ":" ClientBlob CRLF) "250 OK" CRLF ServiceID = The Onion Service address without the trailing ".onion" suffix

Tells the server to create a new Onion ("Hidden") Service, with the specified private key and algorithm. If a KeyType of "NEW" is selected, the server will generate a new keypair using the selected algorithm. The "Port" argument's VirtPort and Target values have identical semantics to the corresponding HiddenServicePort configuration values.

The server response will only include a private key if the server was requested to generate a new keypair, and also the "DiscardPK" flag was not specified. (Note that if "DiscardPK" flag is specified, there is no way to recreate the generated keypair and the corresponding Onion Service at a later date).

If client authorization is enabled using the "BasicAuth" flag (which is v2 only), the service will not be accessible to clients without valid authorization data (configured with the "HidServAuth" option). The list of authorized clients is specified with one or more "ClientAuth" parameters. If "ClientBlob" is not specified for a client, a new credential will be randomly generated and returned.

Tor instances can either be in anonymous hidden service mode, or non-anonymous single onion service mode. All hidden services on the same tor instance have the same anonymity. To guard against unexpected loss of anonymity, Tor checks that the ADD_ONION "NonAnonymous" flag matches the current hidden service anonymity mode. The hidden service anonymity mode is configured using the Tor options HiddenServiceSingleHopMode and HiddenServiceNonAnonymousMode. If both these options are 1, the "NonAnonymous" flag must be provided to ADD_ONION. If both these options are 0 (the Tor default), the flag must NOT be provided.

Once created the new Onion Service will remain active until either the Onion Service is removed via "DEL_ONION", the server terminates, or the control connection that originated the "ADD_ONION" command is closed. It is possible to override disabling the Onion Service on control connection close by specifying the "Detach" flag.

It is the Onion Service server application's responsibility to close existing client connections if desired after the Onion Service is removed.

(The KeyBlob format is left intentionally opaque, however for "RSA1024" keys it is currently the Base64 encoded DER representation of a PKCS#1 RSAPrivateKey, with all newlines removed. For a "ED25519-V3" key is the Base64 encoding of the concatenation of the 32-byte ed25519 secret scalar in little-endian and the 32-byte ed25519 PRF secret.)

[Note: The ED25519-V3 format is not the same as, e.g., SUPERCOP ed25519/ref, which stores the concatenation of the 32-byte ed25519 hash seed concatenated with the 32-byte public key, and which derives the secret scalar and PRF secret by expanding the hash seed with SHA-512. Our key blinding scheme is incompatible with storing private keys as seeds, so we store the secret scalar alongside the PRF secret, and just pay the cost of recomputing the public key when importing an ED25519-V3 key.]

Examples:

C: ADD_ONION NEW:BEST Flags=DiscardPK Port=80 S: 250-ServiceID=exampleoniont2pqglbny66wpovyvao3ylc23eileodtevc4b75ikpad S: 250 OK C: ADD_ONION RSA1024:[Blob Redacted] Port=80,192.168.1.1:8080 S: 250-ServiceID=sampleonion12456 S: 250 OK C: ADD_ONION NEW:BEST Port=22 Port=80,8080 S: 250-ServiceID=sampleonion4t2pqglbny66wpovyvao3ylc23eileodtevc4b75ikpad S: 250-PrivateKey=ED25519-V3:[Blob Redacted] S: 250 OK C: ADD_ONION NEW:RSA1024 Flags=DiscardPK,BasicAuth Port=22 ClientAuth=alice:[Blob Redacted] ClientAuth=bob S: 250-ServiceID=testonion1234567 S: 250-ClientAuth=bob:[Blob Redacted] S: 250 OK C: ADD_ONION NEW:ED25519-V3 ClientAuthV3=[Blob Redacted] Port=22 S: 250-ServiceID=n35etu3yjxrqjpntmfziom5sjwspoydchmelc4xleoy4jk2u4lziz2yd S: 250-ClientAuthV3=[Blob Redacted] S: 250 OK Examples with Tor in anonymous onion service mode: C: ADD_ONION NEW:BEST Flags=DiscardPK Port=22 S: 250-ServiceID=exampleoniont2pqglbny66wpovyvao3ylc23eileodtevc4b75ikpad S: 250 OK C: ADD_ONION NEW:BEST Flags=DiscardPK,NonAnonymous Port=22 S: 512 Tor is in anonymous hidden service mode Examples with Tor in non-anonymous onion service mode: C: ADD_ONION NEW:BEST Flags=DiscardPK Port=22 S: 512 Tor is in non-anonymous hidden service mode C: ADD_ONION NEW:BEST Flags=DiscardPK,NonAnonymous Port=22 S: 250-ServiceID=exampleoniont2pqglbny66wpovyvao3ylc23eileodtevc4b75ikpad S: 250 OK

[ADD_ONION was added in Tor 0.2.7.1-alpha.] [MaxStreams and MaxStreamsCloseCircuit were added in Tor 0.2.7.2-alpha] [ClientAuth was added in Tor 0.2.9.1-alpha. It is v2 only.] [NonAnonymous was added in Tor 0.2.9.3-alpha.] [HS v3 support added 0.3.3.1-alpha] [ClientV3Auth support added 0.4.6.1-alpha]

DEL_ONION

The syntax is:

"DEL_ONION" SP ServiceID CRLF

ServiceID = The Onion Service address without the trailing ".onion" suffix

Tells the server to remove an Onion ("Hidden") Service, that was previously created via an "ADD_ONION" command. It is only possible to remove Onion Services that were created on the same control connection as the "DEL_ONION" command, and those that belong to no control connection in particular (The "Detach" flag was specified at creation).

If the ServiceID is invalid, or is neither owned by the current control connection nor a detached Onion Service, the server will return a 552.

It is the Onion Service server application's responsibility to close existing client connections if desired after the Onion Service has been removed via "DEL_ONION".

Tor replies with "250 OK" on success, or a 512 if there are an invalid number of arguments, or a 552 if it doesn't recognize the ServiceID.

[DEL_ONION was added in Tor 0.2.7.1-alpha.] [HS v3 support added 0.3.3.1-alpha]

HSPOST

The syntax is:

"+HSPOST" *[SP "SERVER=" Server] [SP "HSADDRESS=" HSAddress] CRLF Descriptor CRLF "." CRLF Server = LongName HSAddress = 56*Base32Character Descriptor = The text of the descriptor formatted as specified in rend-spec.txt section 1.3.

The "HSAddress" key is optional and only applies for v3 descriptors. A 513 error is returned if used with v2.

This command launches a hidden service descriptor upload to the specified HSDirs. If one or more Server arguments are provided, an upload is triggered on each of them in parallel. If no Server options are provided, it behaves like a normal HS descriptor upload and will upload to the set of responsible HS directories.

If any value is unrecognized, a 552 error is returned and the command is stopped. If there is an error in parsing the descriptor, the server must send a "554 Invalid descriptor" reply.

On success, Tor replies "250 OK" then Tor MUST eventually follow this with a HS_DESC event with the result for each upload location.

Examples are:

C: +HSPOST SERVER=9695DFC35FFEB861329B9F1AB04C46397020CE31 [DESCRIPTOR] . S: 250 OK [HSPOST was added in Tor 0.2.7.1-alpha]

ONION_CLIENT_AUTH_ADD

The syntax is:

"ONION_CLIENT_AUTH_ADD" SP HSAddress SP KeyType ":" PrivateKeyBlob [SP "ClientName=" Nickname] [SP "Flags=" TYPE] CRLF HSAddress = 56*Base32Character KeyType = "x25519" is the only one supported right now PrivateKeyBlob = base64 encoding of x25519 key

Tells the connected Tor to add client-side v3 client auth credentials for the onion service with "HSAddress". The "PrivateKeyBlob" is the x25519 private key that should be used for this client, and "Nickname" is an optional nickname for the client.

FLAGS is a comma-separated tuple of flags for this new client. For now, the currently supported flags are:

"Permanent" - This client's credentials should be stored in the filesystem. If this is not set, the client's credentials are ephemeral and stored in memory.

If client auth credentials already existed for this service, replace them with the new ones.

If Tor has cached onion service descriptors that it has been unable to decrypt in the past (due to lack of client auth credentials), attempt to decrypt those descriptors as soon as this command succeeds.

On success, "250 OK" is returned. Otherwise, the following error codes exist:

251 - Client auth credentials for this onion service already existed and replaced. 252 - Added client auth credentials and successfully decrypted a cached descriptor. 451 - We reached authorized client capacity 512 - Syntax error in "HSAddress", or "PrivateKeyBlob" or "Nickname" 551 - Client with with this "Nickname" already exists 552 - Unrecognized KeyType [ONION_CLIENT_AUTH_ADD was added in Tor 0.4.3.1-alpha]

ONION_CLIENT_AUTH_REMOVE

The syntax is:

"ONION_CLIENT_AUTH_REMOVE" SP HSAddress

KeyType = "x25519" is the only one supported right now

Tells the connected Tor to remove the client-side v3 client auth credentials for the onion service with "HSAddress".

On success "250 OK" is returned. Otherwise, the following error codes exist:

512 - Syntax error in "HSAddress". 251 - Client credentials for "HSAddress" did not exist. [ONION_CLIENT_AUTH_REMOVE was added in Tor 0.4.3.1-alpha]

ONION_CLIENT_AUTH_VIEW

The syntax is:

"ONION_CLIENT_AUTH_VIEW" [SP HSAddress] CRLF

Tells the connected Tor to list all the stored client-side v3 client auth credentials for "HSAddress". If no "HSAddress" is provided, list all the stored client-side v3 client auth credentials.

The server reply format is:

"250-ONION_CLIENT_AUTH_VIEW" [SP HSAddress] CRLF *("250-CLIENT" SP HSAddress SP KeyType ":" PrivateKeyBlob [SP "ClientName=" Nickname] [SP "Flags=" FLAGS] CRLF) "250 OK" CRLF HSAddress = The onion address under which this credential is stored KeyType = "x25519" is the only one supported right now PrivateKeyBlob = base64 encoding of x25519 key

"Nickname" is an optional nickname for this client, which can be set either through the ONION_CLIENT_AUTH_ADD command, or it's the filename of this client if the credentials are stored in the filesystem.

FLAGS is a comma-separated field of flags for this client, the currently supported flags are:

"Permanent" - This client's credentials are stored in the filesystem.

On success "250 OK" is returned. Otherwise, the following error codes exist:

512 - Syntax error in "HSAddress".

[ONION_CLIENT_AUTH_VIEW was added in Tor 0.4.3.1-alpha]

DROPOWNERSHIP

The syntax is:

"DROPOWNERSHIP" CRLF

This command instructs Tor to relinquish ownership of its control connection. As such tor will not shut down when this control connection is closed.

This method is idempotent. If the control connection does not already have ownership this method returns successfully, and does nothing.

The controller can call TAKEOWNERSHIP again to re-establish ownership.

[DROPOWNERSHIP was added in Tor 0.4.0.0-alpha]

DROPTIMEOUTS

The syntax is: "DROPTIMEOUTS" CRLF

Tells the server to drop all circuit build times. Do not invoke this command lightly; it can increase vulnerability to tracking attacks over time.

Tor replies with "250 OK" on success. Tor also emits the BUILDTIMEOUT_SET RESET event right after this "250 OK".

[DROPTIMEOUTS was added in Tor 0.4.5.0-alpha.]

Replies

Reply codes follow the same 3-character format as used by SMTP, with the first character defining a status, the second character defining a subsystem, and the third designating fine-grained information.

The TC protocol currently uses the following first characters:

2yz Positive Completion Reply The command was successful; a new request can be started. 4yz Temporary Negative Completion reply The command was unsuccessful but might be reattempted later. 5yz Permanent Negative Completion Reply The command was unsuccessful; the client should not try exactly that sequence of commands again. 6yz Asynchronous Reply Sent out-of-order in response to an earlier SETEVENTS command. The following second characters are used: x0z Syntax Sent in response to ill-formed or nonsensical commands. x1z Protocol Refers to operations of the Tor Control protocol. x5z Tor Refers to actual operations of Tor system. The following codes are defined: 250 OK 251 Operation was unnecessary [Tor has declined to perform the operation, but no harm was done.] 451 Resource exhausted 500 Syntax error: protocol 510 Unrecognized command 511 Unimplemented command 512 Syntax error in command argument 513 Unrecognized command argument 514 Authentication required 515 Bad authentication 550 Unspecified Tor error 551 Internal error [Something went wrong inside Tor, so that the client's request couldn't be fulfilled.] 552 Unrecognized entity [A configuration key, a stream ID, circuit ID, event, mentioned in the command did not actually exist.] 553 Invalid configuration value [The client tried to set a configuration option to an incorrect, ill-formed, or impossible value.] 554 Invalid descriptor 555 Unmanaged entity 650 Asynchronous event notification

Unless specified to have specific contents, the human-readable messages in error replies should not be relied upon to match those in this document.

Asynchronous events

These replies can be sent after a corresponding SETEVENTS command has been received. They will not be interleaved with other Reply elements, but they can appear between a command and its corresponding reply. For example, this sequence is possible:

C: SETEVENTS CIRC S: 250 OK C: GETCONF SOCKSPORT ORPORT S: 650 CIRC 1000 EXTENDED moria1,moria2 S: 250-SOCKSPORT=9050 S: 250 ORPORT=0 But this sequence is disallowed: C: SETEVENTS CIRC S: 250 OK C: GETCONF SOCKSPORT ORPORT S: 250-SOCKSPORT=9050 S: 650 CIRC 1000 EXTENDED moria1,moria2 S: 250 ORPORT=0

Clients MUST tolerate more arguments in an asynchronous reply than expected, and MUST tolerate more lines in an asynchronous reply than expected. For instance, a client that expects a CIRC message like:

650 CIRC 1000 EXTENDED moria1,moria2

must tolerate:

650-CIRC 1000 EXTENDED moria1,moria2 0xBEEF 650-EXTRAMAGIC=99 650 ANONYMITY=high

If clients receives extended events (selected by USEFEATUERE EXTENDED_EVENTS in Tor 0.1.2.2-alpha..Tor-0.2.1.x, and always-on in Tor 0.2.2.x and later), then each event line as specified below may be followed by additional arguments and additional lines. Additional lines will be of the form:

"650" ("-"/" ") KEYWORD ["=" ARGUMENTS] CRLF

Additional arguments will be of the form

SP KEYWORD ["=" ( QuotedString / * NonSpDquote ) ]

Clients MUST tolerate events with arguments and keywords they do not recognize, and SHOULD process those events as if any unrecognized arguments and keywords were not present.

Clients SHOULD NOT depend on the order of keyword=value arguments, and SHOULD NOT depend on there being no new keyword=value arguments appearing between existing keyword=value arguments, though as of this writing (Jun 2011) some do. Thus, extensions to this protocol should add new keywords only after the existing keywords, until all controllers have been fixed. At some point this "SHOULD NOT" might become a "MUST NOT".

Circuit status changed

The syntax is:

"650" SP "CIRC" SP CircuitID SP CircStatus [SP Path] [SP "BUILD_FLAGS=" BuildFlags] [SP "PURPOSE=" Purpose] [SP "HS_STATE=" HSState] [SP "REND_QUERY=" HSAddress] [SP "TIME_CREATED=" TimeCreated] [SP "REASON=" Reason [SP "REMOTE_REASON=" Reason]] [SP "SOCKS_USERNAME=" EscapedUsername] [SP "SOCKS_PASSWORD=" EscapedPassword] [SP "HS_POW=" HSPoW ] CRLF CircStatus = "LAUNCHED" / ; circuit ID assigned to new circuit "BUILT" / ; all hops finished, can now accept streams "GUARD_WAIT" / ; all hops finished, waiting to see if a ; circuit with a better guard will be usable. "EXTENDED" / ; one more hop has been completed "FAILED" / ; circuit closed (was not built) "CLOSED" ; circuit closed (was built) Path = LongName *("," LongName) ; In Tor versions 0.1.2.2-alpha through 0.2.2.1-alpha with feature ; VERBOSE_NAMES turned off and before version 0.1.2.2-alpha, Path ; is as follows: ; Path = ServerID *("," ServerID) BuildFlags = BuildFlag *("," BuildFlag) BuildFlag = "ONEHOP_TUNNEL" / "IS_INTERNAL" / "NEED_CAPACITY" / "NEED_UPTIME" Purpose = "GENERAL" / "HS_CLIENT_INTRO" / "HS_CLIENT_REND" / "HS_SERVICE_INTRO" / "HS_SERVICE_REND" / "TESTING" / "CONTROLLER" / "MEASURE_TIMEOUT" / "HS_VANGUARDS" / "PATH_BIAS_TESTING" / "CIRCUIT_PADDING" HSState = "HSCI_CONNECTING" / "HSCI_INTRO_SENT" / "HSCI_DONE" / "HSCR_CONNECTING" / "HSCR_ESTABLISHED_IDLE" / "HSCR_ESTABLISHED_WAITING" / "HSCR_JOINED" / "HSSI_CONNECTING" / "HSSI_ESTABLISHED" / "HSSR_CONNECTING" / "HSSR_JOINED" HSPoWType = "v1" HSPoWEffort = 1*DIGIT HSPoW = HSPoWType "," HSPoWEffort EscapedUsername = QuotedString EscapedPassword = QuotedString HSAddress = 16*Base32Character / 56*Base32Character Base32Character = ALPHA / "2" / "3" / "4" / "5" / "6" / "7" TimeCreated = ISOTime2Frac Seconds = 1*DIGIT Microseconds = 1*DIGIT Reason = "NONE" / "TORPROTOCOL" / "INTERNAL" / "REQUESTED" / "HIBERNATING" / "RESOURCELIMIT" / "CONNECTFAILED" / "OR_IDENTITY" / "OR_CONN_CLOSED" / "TIMEOUT" / "FINISHED" / "DESTROYED" / "NOPATH" / "NOSUCHSERVICE" / "MEASUREMENT_EXPIRED" The path is provided only when the circuit has been extended at least one hop. The "BUILD_FLAGS" field is provided only in versions 0.2.3.11-alpha and later. Clients MUST accept build flags not listed above. Build flags are defined as follows: ONEHOP_TUNNEL (one-hop circuit, used for tunneled directory conns) IS_INTERNAL (internal circuit, not to be used for exiting streams) NEED_CAPACITY (this circuit must use only high-capacity nodes) NEED_UPTIME (this circuit must use only high-uptime nodes) The "PURPOSE" field is provided only in versions 0.2.1.6-alpha and later, and only if extended events are enabled (see 3.19). Clients MUST accept purposes not listed above. Purposes are defined as follows: GENERAL (circuit for AP and/or directory request streams) HS_CLIENT_INTRO (HS client-side introduction-point circuit) HS_CLIENT_REND (HS client-side rendezvous circuit; carries AP streams) HS_SERVICE_INTRO (HS service-side introduction-point circuit) HS_SERVICE_REND (HS service-side rendezvous circuit) TESTING (reachability-testing circuit; carries no traffic) CONTROLLER (circuit built by a controller) MEASURE_TIMEOUT (circuit being kept around to see how long it takes) HS_VANGUARDS (circuit created ahead of time when using HS vanguards, and later repurposed as needed) PATH_BIAS_TESTING (circuit used to probe whether our circuits are being deliberately closed by an attacker) CIRCUIT_PADDING (circuit that is being held open to disguise its true close time) The "HS_STATE" field is provided only for hidden-service circuits, and only in versions 0.2.3.11-alpha and later. Clients MUST accept hidden-service circuit states not listed above. Hidden-service circuit states are defined as follows: HSCI_* (client-side introduction-point circuit states) HSCI_CONNECTING (connecting to intro point) HSCI_INTRO_SENT (sent INTRODUCE1; waiting for reply from IP) HSCI_DONE (received reply from IP relay; closing) HSCR_* (client-side rendezvous-point circuit states) HSCR_CONNECTING (connecting to or waiting for reply from RP) HSCR_ESTABLISHED_IDLE (established RP; waiting for introduction) HSCR_ESTABLISHED_WAITING (introduction sent to HS; waiting for rend) HSCR_JOINED (connected to HS) HSSI_* (service-side introduction-point circuit states) HSSI_CONNECTING (connecting to intro point) HSSI_ESTABLISHED (established intro point) HSSR_* (service-side rendezvous-point circuit states) HSSR_CONNECTING (connecting to client's rend point) HSSR_JOINED (connected to client's RP circuit) The "SOCKS_USERNAME" and "SOCKS_PASSWORD" fields indicate the credentials that were used by a SOCKS client to connect to Tor's SOCKS port and initiate this circuit. (Streams for SOCKS clients connected with different usernames and/or passwords are isolated on separate circuits if the IsolateSOCKSAuth flag is active; see Proposal 171.) [Added in Tor 0.4.3.1-alpha.] The "REND_QUERY" field is provided only for hidden-service-related circuits, and only in versions 0.2.3.11-alpha and later. Clients MUST accept hidden service addresses in formats other than that specified above. [Added in Tor 0.4.3.1-alpha.] The "TIME_CREATED" field is provided only in versions 0.2.3.11-alpha and later. TIME_CREATED is the time at which the circuit was created or cannibalized. [Added in Tor 0.4.3.1-alpha.] The "REASON" field is provided only for FAILED and CLOSED events, and only if extended events are enabled (see 3.19). Clients MUST accept reasons not listed above. [Added in Tor 0.4.3.1-alpha.] Reasons are as given in tor-spec.txt, except for: NOPATH (Not enough nodes to make circuit) MEASUREMENT_EXPIRED (As "TIMEOUT", except that we had left the circuit open for measurement purposes to see how long it would take to finish.) IP_NOW_REDUNDANT (Closing a circuit to an introduction point that has become redundant, since some other circuit opened in parallel with it has succeeded.) The "REMOTE_REASON" field is provided only when we receive a DESTROY or TRUNCATE cell, and only if extended events are enabled. It contains the actual reason given by the remote OR for closing the circuit. Clients MUST accept reasons not listed above. Reasons are as listed in tor-spec.txt. [Added in Tor 0.4.3.1-alpha.]

Stream status changed

The syntax is:

"650" SP "STREAM" SP StreamID SP StreamStatus SP CircuitID SP Target [SP "REASON=" Reason [ SP "REMOTE_REASON=" Reason ]] [SP "SOURCE=" Source] [ SP "SOURCE_ADDR=" Address ":" Port ] [SP "PURPOSE=" Purpose] [SP "SOCKS_USERNAME=" EscapedUsername] [SP "SOCKS_PASSWORD=" EscapedPassword] [SP "CLIENT_PROTOCOL=" ClientProtocol] [SP "NYM_EPOCH=" NymEpoch] [SP "SESSION_GROUP=" SessionGroup] [SP "ISO_FIELDS=" IsoFields] CRLF StreamStatus = "NEW" / ; New request to connect "NEWRESOLVE" / ; New request to resolve an address "REMAP" / ; Address re-mapped to another "SENTCONNECT" / ; Sent a connect cell along a circuit "SENTRESOLVE" / ; Sent a resolve cell along a circuit "SUCCEEDED" / ; Received a reply; stream established "FAILED" / ; Stream failed and not retriable "CLOSED" / ; Stream closed "DETACHED" / ; Detached from circuit; still retriable "CONTROLLER_WAIT" ; Waiting for controller to use ATTACHSTREAM ; (new in 0.4.5.1-alpha) "XOFF_SENT" ; XOFF has been sent for this stream ; (new in 0.4.7.5-alpha) "XOFF_RECV" ; XOFF has been received for this stream ; (new in 0.4.7.5-alpha) "XON_SENT" ; XON has been sent for this stream ; (new in 0.4.7.5-alpha) "XON_RECV" ; XON has been received for this stream ; (new in 0.4.7.5-alpha) Target = TargetAddress ":" Port Port = an integer from 0 to 65535 inclusive TargetAddress = Address / "(Tor_internal)" EscapedUsername = QuotedString EscapedPassword = QuotedString ClientProtocol = "SOCKS4" / "SOCKS5" / "TRANS" / "NATD" / "DNS" / "HTTPCONNECT" / "UNKNOWN" NymEpoch = a nonnegative integer SessionGroup = an integer IsoFields = a comma-separated list of IsoField values IsoField = "CLIENTADDR" / "CLIENTPORT" / "DESTADDR" / "DESTPORT" / the name of a field that is valid for STREAM events

The circuit ID designates which circuit this stream is attached to. If the stream is unattached, the circuit ID "0" is given. The target indicates the address which the stream is meant to resolve or connect to; it can be "(Tor_internal)" for a virtual stream created by the Tor program to talk to itself.

Reason = "MISC" / "RESOLVEFAILED" / "CONNECTREFUSED" / "EXITPOLICY" / "DESTROY" / "DONE" / "TIMEOUT" / "NOROUTE" / "HIBERNATING" / "INTERNAL"/ "RESOURCELIMIT" / "CONNRESET" / "TORPROTOCOL" / "NOTDIRECTORY" / "END" / "PRIVATE_ADDR" The "REASON" field is provided only for FAILED, CLOSED, and DETACHED events, and only if extended events are enabled (see 3.19). Clients MUST accept reasons not listed above. Reasons are as given in tor-spec.txt, except for: END (We received a RELAY_END cell from the other side of this stream.) PRIVATE_ADDR (The client tried to connect to a private address like 127.0.0.1 or 10.0.0.1 over Tor.) [XXXX document more. -NM] The "REMOTE_REASON" field is provided only when we receive a RELAY_END cell, and only if extended events are enabled. It contains the actual reason given by the remote OR for closing the stream. Clients MUST accept reasons not listed above. Reasons are as listed in tor-spec.txt. "REMAP" events include a Source if extended events are enabled: Source = "CACHE" / "EXIT" Clients MUST accept sources not listed above. "CACHE" is given if the Tor client decided to remap the address because of a cached answer, and "EXIT" is given if the remote node we queried gave us the new address as a response. The "SOURCE_ADDR" field is included with NEW and NEWRESOLVE events if extended events are enabled. It indicates the address and port that requested the connection, and can be (e.g.) used to look up the requesting program. Purpose = "DIR_FETCH" / "DIR_UPLOAD" / "DNS_REQUEST" / "USER" / "DIRPORT_TEST" The "PURPOSE" field is provided only for NEW and NEWRESOLVE events, and only if extended events are enabled (see 3.19). Clients MUST accept purposes not listed above. The purposes above are defined as: "DIR_FETCH" -- This stream is generated internally to Tor for fetching directory information. "DIR_UPLOAD" -- An internal stream for uploading information to a directory authority. "DIRPORT_TEST" -- A stream we're using to test our own directory port to make sure it's reachable. "DNS_REQUEST" -- A user-initiated DNS request. "USER" -- This stream is handling user traffic, OR it's internal to Tor, but it doesn't match one of the purposes above. The "SOCKS_USERNAME" and "SOCKS_PASSWORD" fields indicate the credentials that were used by a SOCKS client to connect to Tor's SOCKS port and initiate this stream. (Streams for SOCKS clients connected with different usernames and/or passwords are isolated on separate circuits if the IsolateSOCKSAuth flag is active; see Proposal 171.) The "CLIENT_PROTOCOL" field indicates the protocol that was used by a client to initiate this stream. (Streams for clients connected with different protocols are isolated on separate circuits if the IsolateClientProtocol flag is active.) Controllers MUST tolerate unrecognized client protocols. The "NYM_EPOCH" field indicates the nym epoch that was active when a client initiated this stream. The epoch increments when the NEWNYM signal is received. (Streams with different nym epochs are isolated on separate circuits.) The "SESSION_GROUP" field indicates the session group of the listener port that a client used to initiate this stream. By default, the session group is different for each listener port, but this can be overridden for a listener via the "SessionGroup" option in torrc. (Streams with different session groups are isolated on separate circuits.) The "ISO_FIELDS" field indicates the set of STREAM event fields for which stream isolation is enabled for the listener port that a client used to initiate this stream. The special values "CLIENTADDR", "CLIENTPORT", "DESTADDR", and "DESTPORT", if their correspondingly named fields are not present, refer to the Address and Port components of the "SOURCE_ADDR" and Target fields.

OR Connection status changed

The syntax is:

"650" SP "ORCONN" SP (LongName / Target) SP ORStatus [ SP "REASON=" Reason ] [ SP "NCIRCS=" NumCircuits ] [ SP "ID=" ConnID ] CRLF ORStatus = "NEW" / "LAUNCHED" / "CONNECTED" / "FAILED" / "CLOSED" ; In Tor versions 0.1.2.2-alpha through 0.2.2.1-alpha with feature ; VERBOSE_NAMES turned off and before version 0.1.2.2-alpha, OR ; Connection is as follows: "650" SP "ORCONN" SP (ServerID / Target) SP ORStatus [ SP "REASON=" Reason ] [ SP "NCIRCS=" NumCircuits ] CRLF

NEW is for incoming connections, and LAUNCHED is for outgoing connections. CONNECTED means the TLS handshake has finished (in either direction). FAILED means a connection is being closed that hasn't finished its handshake, and CLOSED is for connections that have handshaked.

A LongName or ServerID is specified unless it's a NEW connection, in which case we don't know what server it is yet, so we use Address:Port.

If extended events are enabled (see 3.19), optional reason and circuit counting information is provided for CLOSED and FAILED events.

Reason = "MISC" / "DONE" / "CONNECTREFUSED" / "IDENTITY" / "CONNECTRESET" / "TIMEOUT" / "NOROUTE" / "IOERROR" / "RESOURCELIMIT" / "PT_MISSING" NumCircuits counts both established and pending circuits. The ORStatus values are as follows: NEW -- We have received a new incoming OR connection, and are starting the server-side handshake. LAUNCHED -- We have launched a new outgoing OR connection, and are starting the client-side handshake. CONNECTED -- The OR connection has been connected and the handshake is done. FAILED -- Our attempt to open the OR connection failed. CLOSED -- The OR connection closed in an unremarkable way. The Reason values for closed/failed OR connections are: DONE -- The OR connection has shut down cleanly. CONNECTREFUSED -- We got an ECONNREFUSED while connecting to the target OR. IDENTITY -- We connected to the OR, but found that its identity was not what we expected. CONNECTRESET -- We got an ECONNRESET or similar IO error from the connection with the OR. TIMEOUT -- We got an ETIMEOUT or similar IO error from the connection with the OR, or we're closing the connection for being idle for too long. NOROUTE -- We got an ENOTCONN, ENETUNREACH, ENETDOWN, EHOSTUNREACH, or similar error while connecting to the OR. IOERROR -- We got some other IO error on our connection to the OR. RESOURCELIMIT -- We don't have enough operating system resources (file descriptors, buffers, etc) to connect to the OR. PT_MISSING -- No pluggable transport was available. MISC -- The OR connection closed for some other reason. [First added ID parameter in 0.2.5.2-alpha]

Bandwidth used in the last second

The syntax is:

"650" SP "BW" SP BytesRead SP BytesWritten *(SP Type "=" Num) CRLF BytesRead = 1*DIGIT BytesWritten = 1*DIGIT Type = "DIR" / "OR" / "EXIT" / "APP" / ... Num = 1*DIGIT

BytesRead and BytesWritten are the totals. [In a future Tor version, we may also include a breakdown of the connection types that used bandwidth this second (not implemented yet).]

Log messages

The syntax is:

"650" SP Severity SP ReplyText CRLF

or

"650+" Severity CRLF Data 650 SP "OK" CRLF

Severity = "DEBUG" / "INFO" / "NOTICE" / "WARN"/ "ERR"

Some low-level logs may be sent from signal handlers, so their destination logs must be signal-safe. These low-level logs include backtraces, logging function errors, and errors in code called by logging functions. Signal-safe logs are never sent as control port log events.

Control port message trace debug logs are never sent as control port log events, to avoid modifying control output when debugging.

New descriptors available

This event is generated when new router descriptors (not microdescs or extrainfos or anything else) are received.

Syntax:

"650" SP "NEWDESC" 1*(SP LongName) CRLF ; In Tor versions 0.1.2.2-alpha through 0.2.2.1-alpha with feature ; VERBOSE_NAMES turned off and before version 0.1.2.2-alpha, it ; is as follows: "650" SP "NEWDESC" 1*(SP ServerID) CRLF

New Address mapping

These events are generated when a new address mapping is entered in Tor's address map cache, or when the answer for a RESOLVE command is found. Entries can be created by a successful or failed DNS lookup, a successful or failed connection attempt, a RESOLVE command, a MAPADDRESS command, the AutomapHostsOnResolve feature, or the TrackHostExits feature.

Syntax:

"650" SP "ADDRMAP" SP Address SP NewAddress SP Expiry [SP "error=" ErrorCode] [SP "EXPIRES=" UTCExpiry] [SP "CACHED=" Cached] [SP "STREAMID=" StreamId] CRLF NewAddress = Address / "<error>" Expiry = DQUOTE ISOTime DQUOTE / "NEVER" ErrorCode = "yes" / "internal" / "Unable to launch resolve request" UTCExpiry = DQUOTE IsoTime DQUOTE Cached = DQUOTE "YES" DQUOTE / DQUOTE "NO" DQUOTE StreamId = DQUOTE StreamId DQUOTE

Error and UTCExpiry are only provided if extended events are enabled. The values for Error are mostly useless. Future values will be chosen to match 1*(ALNUM / "_"); the "Unable to launch resolve request" value is a bug in Tor before 0.2.4.7-alpha.

Expiry is expressed as the local time (rather than UTC). This is a bug, left in for backward compatibility; new code should look at UTCExpiry instead. (If Expiry is "NEVER", UTCExpiry is omitted.)

Cached indicates whether the mapping will be stored until it expires, or if it is just a notification in response to a RESOLVE command.

StreamId is the global stream identifier of the stream or circuit from which the address was resolved.

Descriptors uploaded to us in our role as authoritative dirserver

[NOTE: This feature was removed in Tor 0.3.2.1-alpha.]

Tor generates this event when it's a directory authority, and somebody has just uploaded a server descriptor.

Syntax:

"650" "+" "AUTHDIR_NEWDESCS" CRLF Action CRLF Message CRLF Descriptor CRLF "." CRLF "650" SP "OK" CRLF Action = "ACCEPTED" / "DROPPED" / "REJECTED" Message = Text

The Descriptor field is the text of the server descriptor; the Action field is "ACCEPTED" if we're accepting the descriptor as the new best valid descriptor for its router, "REJECTED" if we aren't taking the descriptor and we're complaining to the uploading relay about it, and "DROPPED" if we decide to drop the descriptor without complaining. The Message field is a human-readable string explaining why we chose the Action. (It doesn't contain newlines.)

Our descriptor changed

Syntax:

"650" SP "DESCCHANGED" CRLF

[First added in 0.1.2.2-alpha.]

Status events

Status events (STATUS_GENERAL, STATUS_CLIENT, and STATUS_SERVER) are sent based on occurrences in the Tor process pertaining to the general state of the program. Generally, they correspond to log messages of severity Notice or higher. They differ from log messages in that their format is a specified interface.

Syntax:

"650" SP StatusType SP StatusSeverity SP StatusAction [SP StatusArguments] CRLF StatusType = "STATUS_GENERAL" / "STATUS_CLIENT" / "STATUS_SERVER" StatusSeverity = "NOTICE" / "WARN" / "ERR" StatusAction = 1*ALPHA StatusArguments = StatusArgument *(SP StatusArgument) StatusArgument = StatusKeyword '=' StatusValue StatusKeyword = 1*(ALNUM / "_") StatusValue = 1*(ALNUM / '_') / QuotedString StatusAction is a string, and StatusArguments is a series of keyword=value pairs on the same line. Values may be space-terminated strings, or quoted strings. These events are always produced with EXTENDED_EVENTS and VERBOSE_NAMES; see the explanations in the USEFEATURE section for details. Controllers MUST tolerate unrecognized actions, MUST tolerate unrecognized arguments, MUST tolerate missing arguments, and MUST tolerate arguments that arrive in any order. Each event description below is accompanied by a recommendation for controllers. These recommendations are suggestions only; no controller is required to implement them.

Compatibility note: versions of Tor before 0.2.0.22-rc incorrectly generated "STATUS_SERVER" as "STATUS_SEVER". To be compatible with those versions, tools should accept both.

Actions for STATUS_GENERAL events can be as follows:

CLOCK_JUMPED "TIME=NUM" Tor spent enough time without CPU cycles that it has closed all its circuits and will establish them anew. This typically happens when a laptop goes to sleep and then wakes up again. It also happens when the system is swapping so heavily that Tor is starving. The "time" argument specifies the number of seconds Tor thinks it was unconscious for (or alternatively, the number of seconds it went back in time). This status event is sent as NOTICE severity normally, but WARN severity if Tor is acting as a server currently. {Recommendation for controller: ignore it, since we don't really know what the user should do anyway. Hm.} DANGEROUS_VERSION "CURRENT=version" "REASON=NEW/OBSOLETE/UNRECOMMENDED" "RECOMMENDED=\"version, version, ...\"" Tor has found that directory servers don't recommend its version of the Tor software. RECOMMENDED is a comma-and-space-separated string of Tor versions that are recommended. REASON is NEW if this version of Tor is newer than any recommended version, OBSOLETE if this version of Tor is older than any recommended version, and UNRECOMMENDED if some recommended versions of Tor are newer and some are older than this version. (The "OBSOLETE" reason was called "OLD" from Tor 0.1.2.3-alpha up to and including 0.2.0.12-alpha.) {Controllers may want to suggest that the user upgrade OLD or UNRECOMMENDED versions. NEW versions may be known-insecure, or may simply be development versions.} TOO_MANY_CONNECTIONS "CURRENT=NUM" Tor has reached its ulimit -n or whatever the native limit is on file descriptors or sockets. CURRENT is the number of sockets Tor currently has open. The user should really do something about this. The "current" argument shows the number of connections currently open. {Controllers may recommend that the user increase the limit, or increase it for them. Recommendations should be phrased in an OS-appropriate way and automated when possible.} BUG "REASON=STRING" Tor has encountered a situation that its developers never expected, and the developers would like to learn that it happened. Perhaps the controller can explain this to the user and encourage her to file a bug report? {Controllers should log bugs, but shouldn't annoy the user in case a bug appears frequently.} CLOCK_SKEW SKEW="+" / "-" SECONDS MIN_SKEW="+" / "-" SECONDS. SOURCE="DIRSERV:" IP ":" Port / "NETWORKSTATUS:" IP ":" Port / "OR:" IP ":" Port / "CONSENSUS" If "SKEW" is present, it's an estimate of how far we are from the time declared in the source. (In other words, if we're an hour in the past, the value is -3600.) "MIN_SKEW" is present, it's a lower bound. If the source is a DIRSERV, we got the current time from a connection to a dirserver. If the source is a NETWORKSTATUS, we decided we're skewed because we got a v2 networkstatus from far in the future. If the source is OR, the skew comes from a NETINFO cell from a connection to another relay. If the source is CONSENSUS, we decided we're skewed because we got a networkstatus consensus from the future. {Tor should send this message to controllers when it thinks the skew is so high that it will interfere with proper Tor operation. Controllers shouldn't blindly adjust the clock, since the more accurate source of skew info (DIRSERV) is currently unauthenticated.} BAD_LIBEVENT "METHOD=" libevent method "VERSION=" libevent version "BADNESS=" "BROKEN" / "BUGGY" / "SLOW" "RECOVERED=" "NO" / "YES" Tor knows about bugs in using the configured event method in this version of libevent. "BROKEN" libevents won't work at all; "BUGGY" libevents might work okay; "SLOW" libevents will work fine, but not quickly. If "RECOVERED" is YES, Tor managed to switch to a more reliable (but probably slower!) libevent method. {Controllers may want to warn the user if this event occurs, though generally it's the fault of whoever built the Tor binary and there's not much the user can do besides upgrade libevent or upgrade the binary.} DIR_ALL_UNREACHABLE Tor believes that none of the known directory servers are reachable -- this is most likely because the local network is down or otherwise not working, and might help to explain for the user why Tor appears to be broken. {Controllers may want to warn the user if this event occurs; further action is generally not possible.} Actions for STATUS_CLIENT events can be as follows: BOOTSTRAP "PROGRESS=" num "TAG=" Keyword "SUMMARY=" String ["WARNING=" String] ["REASON=" Keyword] ["COUNT=" num] ["RECOMMENDATION=" Keyword] ["HOST=" QuotedString] ["HOSTADDR=" QuotedString] Tor has made some progress at establishing a connection to the Tor network, fetching directory information, or making its first circuit; or it has encountered a problem while bootstrapping. This status event is especially useful for users with slow connections or with connectivity problems. "Progress" gives a number between 0 and 100 for how far through the bootstrapping process we are. "Summary" is a string that can be displayed to the user to describe the *next* task that Tor will tackle, i.e., the task it is working on after sending the status event. "Tag" is a string that controllers can use to recognize bootstrap phases, if they want to do something smarter than just blindly displaying the summary string; see Section 5 for the current tags that Tor issues. The StatusSeverity describes whether this is a normal bootstrap phase (severity notice) or an indication of a bootstrapping problem (severity warn). For bootstrap problems, we include the same progress, tag, and summary values as we would for a normal bootstrap event, but we also include "warning", "reason", "count", and "recommendation" key/value combos. The "count" number tells how many bootstrap problems there have been so far at this phase. The "reason" string lists one of the reasons allowed in the ORCONN event. The "warning" argument string with any hints Tor has to offer about why it's having troubles bootstrapping. The "reason" values are long-term-stable controller-facing tags to identify particular issues in a bootstrapping step. The warning strings, on the other hand, are human-readable. Controllers SHOULD NOT rely on the format of any warning string. Currently the possible values for "recommendation" are either "ignore" or "warn" -- if ignore, the controller can accumulate the string in a pile of problems to show the user if the user asks; if warn, the controller should alert the user that Tor is pretty sure there's a bootstrapping problem. The "host" value is the identity digest (in hex) of the node we're trying to connect to; the "hostaddr" is an address:port combination, where 'address' is an ipv4 or ipv6 address. Currently Tor uses recommendation=ignore for the first nine bootstrap problem reports for a given phase, and then uses recommendation=warn for subsequent problems at that phase. Hopefully this is a good balance between tolerating occasional errors and reporting serious problems quickly. ENOUGH_DIR_INFO Tor now knows enough network-status documents and enough server descriptors that it's going to start trying to build circuits now. [Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. If not, Tor will only build internal circuits.] {Controllers may want to use this event to decide when to indicate progress to their users, but should not interrupt the user's browsing to tell them so.} NOT_ENOUGH_DIR_INFO We discarded expired statuses and server descriptors to fall below the desired threshold of directory information. We won't try to build any circuits until ENOUGH_DIR_INFO occurs again. {Controllers may want to use this event to decide when to indicate progress to their users, but should not interrupt the user's browsing to tell them so.} CIRCUIT_ESTABLISHED Tor is able to establish circuits for client use. This event will only be sent if we just built a circuit that changed our mind -- that is, prior to this event we didn't know whether we could establish circuits. {Suggested use: controllers can notify their users that Tor is ready for use as a client once they see this status event. [Perhaps controllers should also have a timeout if too much time passes and this event hasn't arrived, to give tips on how to troubleshoot. On the other hand, hopefully Tor will send further status events if it can identify the problem.]} CIRCUIT_NOT_ESTABLISHED "REASON=" "EXTERNAL_ADDRESS" / "DIR_ALL_UNREACHABLE" / "CLOCK_JUMPED" We are no longer confident that we can build circuits. The "reason" keyword provides an explanation: which other status event type caused our lack of confidence. {Controllers may want to use this event to decide when to indicate progress to their users, but should not interrupt the user's browsing to do so.} [Note: only REASON=CLOCK_JUMPED is implemented currently.] CONSENSUS_ARRIVED Tor has received and validated a new consensus networkstatus. (This event can be delayed a little while after the consensus is received, if Tor needs to fetch certificates.) DANGEROUS_PORT "PORT=" port "RESULT=" "REJECT" / "WARN" A stream was initiated to a port that's commonly used for vulnerable-plaintext protocols. If the Result is "reject", we refused the connection; whereas if it's "warn", we allowed it. {Controllers should warn their users when this occurs, unless they happen to know that the application using Tor is in fact doing so correctly (e.g., because it is part of a distributed bundle). They might also want some sort of interface to let the user configure their RejectPlaintextPorts and WarnPlaintextPorts config options.} DANGEROUS_SOCKS "PROTOCOL=" "SOCKS4" / "SOCKS5" "ADDRESS=" IP:port A connection was made to Tor's SOCKS port using one of the SOCKS approaches that doesn't support hostnames -- only raw IP addresses. If the client application got this address from gethostbyname(), it may be leaking target addresses via DNS. {Controllers should warn their users when this occurs, unless they happen to know that the application using Tor is in fact doing so correctly (e.g., because it is part of a distributed bundle).} SOCKS_UNKNOWN_PROTOCOL "DATA=string" A connection was made to Tor's SOCKS port that tried to use it for something other than the SOCKS protocol. Perhaps the user is using Tor as an HTTP proxy? The DATA is the first few characters sent to Tor on the SOCKS port. {Controllers may want to warn their users when this occurs: it indicates a misconfigured application.} SOCKS_BAD_HOSTNAME "HOSTNAME=QuotedString" Some application gave us a funny-looking hostname. Perhaps it is broken? In any case it won't work with Tor and the user should know. {Controllers may want to warn their users when this occurs: it usually indicates a misconfigured application.} Actions for STATUS_SERVER can be as follows: EXTERNAL_ADDRESS "ADDRESS=IP" "HOSTNAME=NAME" "METHOD=CONFIGURED/CONFIGURED_ORPORT/DIRSERV/RESOLVED/ INTERFACE/GETHOSTNAME" Our best idea for our externally visible IP has changed to 'IP'. If 'HOSTNAME' is present, we got the new IP by resolving 'NAME'. If the method is 'CONFIGURED', the IP was given verbatim as the Address configuration option. If the method is 'CONFIGURED_ORPORT', the IP was given verbatim in the ORPort configuration option. If the method is 'RESOLVED', we resolved the Address configuration option to get the IP. If the method is 'GETHOSTNAME', we resolved our hostname to get the IP. If the method is 'INTERFACE', we got the address of one of our network interfaces to get the IP. If the method is 'DIRSERV', a directory server told us a guess for what our IP might be. {Controllers may want to record this info and display it to the user.} CHECKING_REACHABILITY "ORADDRESS=IP:port" "DIRADDRESS=IP:port" We're going to start testing the reachability of our external OR port or directory port. {This event could affect the controller's idea of server status, but the controller should not interrupt the user to tell them so.} REACHABILITY_SUCCEEDED "ORADDRESS=IP:port" "DIRADDRESS=IP:port" We successfully verified the reachability of our external OR port or directory port (depending on which of ORADDRESS or DIRADDRESS is given.) {This event could affect the controller's idea of server status, but the controller should not interrupt the user to tell them so.} GOOD_SERVER_DESCRIPTOR We successfully uploaded our server descriptor to at least one of the directory authorities, with no complaints. {Originally, the goal of this event was to declare "every authority has accepted the descriptor, so there will be no complaints about it." But since some authorities might be offline, it's harder to get certainty than we had thought. As such, this event is equivalent to ACCEPTED_SERVER_DESCRIPTOR below. Controllers should just look at ACCEPTED_SERVER_DESCRIPTOR and should ignore this event for now.} SERVER_DESCRIPTOR_STATUS "STATUS=" "LISTED" / "UNLISTED" We just got a new networkstatus consensus, and whether we're in it or not in it has changed. Specifically, status is "listed" if we're listed in it but previous to this point we didn't know we were listed in a consensus; and status is "unlisted" if we thought we should have been listed in it (e.g. we were listed in the last one), but we're not. {Moving from listed to unlisted is not necessarily cause for alarm. The relay might have failed a few reachability tests, or the Internet might have had some routing problems. So this feature is mainly to let relay operators know when their relay has successfully been listed in the consensus.} [Not implemented yet. We should do this in 0.2.2.x. -RD] NAMESERVER_STATUS "NS=addr" "STATUS=" "UP" / "DOWN" "ERR=" message One of our nameservers has changed status. {This event could affect the controller's idea of server status, but the controller should not interrupt the user to tell them so.} NAMESERVER_ALL_DOWN All of our nameservers have gone down. {This is a problem; if it happens often without the nameservers coming up again, the user needs to configure more or better nameservers.} DNS_HIJACKED Our DNS provider is providing an address when it should be saying "NOTFOUND"; Tor will treat the address as a synonym for "NOTFOUND". {This is an annoyance; controllers may want to tell admins that their DNS provider is not to be trusted.} DNS_USELESS Our DNS provider is giving a hijacked address instead of well-known websites; Tor will not try to be an exit node. {Controllers could warn the admin if the relay is running as an exit node: the admin needs to configure a good DNS server. Alternatively, this happens a lot in some restrictive environments (hotels, universities, coffeeshops) when the user hasn't registered.} BAD_SERVER_DESCRIPTOR "DIRAUTH=addr:port" "REASON=string" A directory authority rejected our descriptor. Possible reasons include malformed descriptors, incorrect keys, highly skewed clocks, and so on. {Controllers should warn the admin, and try to cope if they can.} ACCEPTED_SERVER_DESCRIPTOR "DIRAUTH=addr:port" A single directory authority accepted our descriptor. // actually notice {This event could affect the controller's idea of server status, but the controller should not interrupt the user to tell them so.} REACHABILITY_FAILED "ORADDRESS=IP:port" "DIRADDRESS=IP:port" We failed to connect to our external OR port or directory port successfully. {This event could affect the controller's idea of server status. The controller should warn the admin and suggest reasonable steps to take.} HIBERNATION_STATUS "STATUS=" "AWAKE" | "SOFT" | "HARD" Our bandwidth based accounting status has changed, and we are now relaying traffic/rejecting new connections/hibernating. {This event could affect the controller's idea of server status. The controller MAY inform the admin, though presumably the accounting was explicitly enabled for a reason.} [This event was added in tor 0.2.9.0-alpha.]

Our set of guard nodes has changed

Syntax:

"650" SP "GUARD" SP Type SP Name SP Status ... CRLF Type = "ENTRY" Name = ServerSpec (Identifies the guard affected) Status = "NEW" | "UP" | "DOWN" | "BAD" | "GOOD" | "DROPPED"

The ENTRY type indicates a guard used for connections to the Tor network.

The Status values are:

"NEW" -- This node was not previously used as a guard; now we have picked it as one. "DROPPED" -- This node is one we previously picked as a guard; we no longer consider it to be a member of our guard list. "UP" -- The guard now seems to be reachable. "DOWN" -- The guard now seems to be unreachable. "BAD" -- Because of flags set in the consensus and/or values in the configuration, this node is now unusable as a guard. "BAD_L2" -- This layer2 guard has expired or got removed from the consensus. This node is removed from the layer2 guard set. "GOOD" -- Because of flags set in the consensus and/or values in the configuration, this node is now usable as a guard. Controllers must accept unrecognized types and unrecognized statuses.

Network status has changed

Syntax:

"650" "+" "NS" CRLF 1*NetworkStatus "." CRLF "650" SP "OK" CRLF

The event is used whenever our local view of a relay status changes. This happens when we get a new v3 consensus (in which case the entries we see are a duplicate of what we see in the NEWCONSENSUS event, below), but it also happens when we decide to mark a relay as up or down in our local status, for example based on connection attempts.

[First added in 0.1.2.3-alpha]

Bandwidth used on an application stream

The syntax is:

"650" SP "STREAM_BW" SP StreamID SP BytesWritten SP BytesRead SP Time CRLF BytesWritten = 1*DIGIT BytesRead = 1*DIGIT Time = ISOTime2Frac

BytesWritten and BytesRead are the number of bytes written and read by the application since the last STREAM_BW event on this stream.

Note that from Tor's perspective, reading a byte on a stream means that the application wrote the byte. That's why the order of "written" vs "read" is opposite for stream_bw events compared to bw events.

The Time field is provided only in versions 0.3.2.1-alpha and later. It records when Tor created the bandwidth event.

These events are generated about once per second per stream; no events are generated for streams that have not written or read. These events apply only to streams entering Tor (such as on a SOCKSPort, TransPort, or so on). They are not generated for exiting streams.

Per-country client stats

The syntax is:

"650" SP "CLIENTS_SEEN" SP TimeStarted SP CountrySummary SP IPVersions CRLF

We just generated a new summary of which countries we've seen clients from recently. The controller could display this for the user, e.g. in their "relay" configuration window, to give them a sense that they are actually being useful.

Currently only bridge relays will receive this event, but once we figure out how to sufficiently aggregate and sanitize the client counts on main relays, we might start sending these events in other cases too.

TimeStarted is a quoted string indicating when the reported summary counts from (in UTCS).

The CountrySummary keyword has as its argument a comma-separated, possibly empty set of "countrycode=count" pairs. For example (without linebreak), 650-CLIENTS_SEEN TimeStarted="2008-12-25 23:50:43" CountrySummary=us=16,de=8,uk=8

The IPVersions keyword has as its argument a comma-separated set of "protocol-family=count" pairs. For example, IPVersions=v4=16,v6=40

Note that these values are rounded, not exact. The rounding algorithm is specified in the description of "geoip-client-origins" in dir-spec.txt.

New consensus networkstatus has arrived

The syntax is:

"650" "+" "NEWCONSENSUS" CRLF 1*NetworkStatus "." CRLF "650" SP "OK" CRLF

A new consensus networkstatus has arrived. We include NS-style lines for every relay in the consensus. NEWCONSENSUS is a separate event from the NS event, because the list here represents every usable relay: so any relay not mentioned in this list is implicitly no longer recommended.

[First added in 0.2.1.13-alpha]

New circuit buildtime has been set

The syntax is:

"650" SP "BUILDTIMEOUT_SET" SP Type SP "TOTAL_TIMES=" Total SP "TIMEOUT_MS=" Timeout SP "XM=" Xm SP "ALPHA=" Alpha SP "CUTOFF_QUANTILE=" Quantile SP "TIMEOUT_RATE=" TimeoutRate SP "CLOSE_MS=" CloseTimeout SP "CLOSE_RATE=" CloseRate CRLF Type = "COMPUTED" / "RESET" / "SUSPENDED" / "DISCARD" / "RESUME" Total = Integer count of timeouts stored Timeout = Integer timeout in milliseconds Xm = Estimated integer Pareto parameter Xm in milliseconds Alpha = Estimated floating point Paredo parameter alpha Quantile = Floating point CDF quantile cutoff point for this timeout TimeoutRate = Floating point ratio of circuits that timeout CloseTimeout = How long to keep measurement circs in milliseconds CloseRate = Floating point ratio of measurement circuits that are closed

A new circuit build timeout time has been set. If Type is "COMPUTED", Tor has computed the value based on historical data. If Type is "RESET", initialization or drastic network changes have caused Tor to reset the timeout back to the default, to relearn again. If Type is "SUSPENDED", Tor has detected a loss of network connectivity and has temporarily changed the timeout value to the default until the network recovers. If type is "DISCARD", Tor has decided to discard timeout values that likely happened while the network was down. If type is "RESUME", Tor has decided to resume timeout calculation.

The Total value is the count of circuit build times Tor used in computing this value. It is capped internally at the maximum number of build times Tor stores (NCIRCUITS_TO_OBSERVE).

The Timeout itself is provided in milliseconds. Internally, Tor rounds this value to the nearest second before using it.

[First added in 0.2.2.7-alpha]

Signal received

The syntax is:

"650" SP "SIGNAL" SP Signal CRLF

Signal = "RELOAD" / "DUMP" / "DEBUG" / "NEWNYM" / "CLEARDNSCACHE"

A signal has been received and actions taken by Tor. The meaning of each signal, and the mapping to Unix signals, is as defined in section 3.7. Future versions of Tor MAY generate signals other than those listed here; controllers MUST be able to accept them.

If Tor chose to ignore a signal (such as NEWNYM), this event will not be sent. Note that some options (like ReloadTorrcOnSIGHUP) may affect the semantics of the signals here.

Note that the HALT (SIGTERM) and SHUTDOWN (SIGINT) signals do not currently generate any event.

[First added in 0.2.3.1-alpha]

Configuration changed

The syntax is:

StartReplyLine *(MidReplyLine) EndReplyLine

StartReplyLine = "650-CONF_CHANGED" CRLF MidReplyLine = "650-" KEYWORD ["=" VALUE] CRLF EndReplyLine = "650 OK"

Tor configuration options have changed (such as via a SETCONF or RELOAD signal). KEYWORD and VALUE specify the configuration option that was changed. Undefined configuration options contain only the KEYWORD.

Circuit status changed slightly

The syntax is:

"650" SP "CIRC_MINOR" SP CircuitID SP CircEvent [SP Path] [SP "BUILD_FLAGS=" BuildFlags] [SP "PURPOSE=" Purpose] [SP "HS_STATE=" HSState] [SP "REND_QUERY=" HSAddress] [SP "TIME_CREATED=" TimeCreated] [SP "OLD_PURPOSE=" Purpose [SP "OLD_HS_STATE=" HSState]] CRLF CircEvent = "PURPOSE_CHANGED" / ; circuit purpose or HS-related state changed "CANNIBALIZED" ; circuit cannibalized Clients MUST accept circuit events not listed above.

The "OLD_PURPOSE" field is provided for both PURPOSE_CHANGED and CANNIBALIZED events. The "OLD_HS_STATE" field is provided whenever the "OLD_PURPOSE" field is provided and is a hidden-service-related purpose.

Other fields are as specified in section 4.1.1 above.

[First added in 0.2.3.11-alpha]

Pluggable transport launched

The syntax is:

"650" SP "TRANSPORT_LAUNCHED" SP Type SP Name SP TransportAddress SP Port Type = "server" | "client" Name = The name of the pluggable transport TransportAddress = An IPv4 or IPv6 address on which the pluggable transport is listening for connections Port = The TCP port on which it is listening for connections. A pluggable transport called 'Name' of type 'Type' was launched successfully and is now listening for connections on 'Address':'Port'.

Bandwidth used on an OR or DIR or EXIT connection

The syntax is:

"650" SP "CONN_BW" SP "ID=" ConnID SP "TYPE=" ConnType SP "READ=" BytesRead SP "WRITTEN=" BytesWritten CRLF ConnType = "OR" / ; Carrying traffic within the tor network. This can either be our own (client) traffic or traffic we're relaying within the network. "DIR" / ; Fetching tor descriptor data, or transmitting descriptors we're mirroring. "EXIT" ; Carrying traffic between the tor network and an external destination. BytesRead = 1*DIGIT BytesWritten = 1*DIGIT Controllers MUST tolerate unrecognized connection types.

BytesWritten and BytesRead are the number of bytes written and read by Tor since the last CONN_BW event on this connection.

These events are generated about once per second per connection; no events are generated for connections that have not read or written. These events are only generated if TestingTorNetwork is set.

[First added in 0.2.5.2-alpha]

Bandwidth used by all streams attached to a circuit

The syntax is:

"650" SP "CIRC_BW" SP "ID=" CircuitID SP "READ=" BytesRead SP "WRITTEN=" BytesWritten SP "TIME=" Time SP "DELIVERED_READ=" DeliveredBytesRead SP "OVERHEAD_READ=" OverheadBytesRead SP "DELIVERED_WRITTEN=" DeliveredBytesWritten SP "OVERHEAD_WRITTEN=" OverheadBytesWritten SP "SS=" SlowStartState SP "CWND=" CWNDCells SP "RTT=" RTTMilliseconds SP "MIN_RTT=" RTTMilliseconds CRLF BytesRead = 1*DIGIT BytesWritten = 1*DIGIT OverheadBytesRead = 1*DIGIT OverheadBytesWritten = 1*DIGIT DeliveredBytesRead = 1*DIGIT DeliveredBytesWritten = 1*DIGIT SlowStartState = 0 or 1 CWNDCells = 1*DIGIT RTTMilliseconds= 1*DIGIT Time = ISOTime2Frac

BytesRead and BytesWritten are the number of bytes read and written on this circuit since the last CIRC_BW event. These bytes have not necessarily been validated by Tor, and can include invalid cells, dropped cells, and ignored cells (such as padding cells). These values include the relay headers, but not circuit headers.

Circuit data that has been validated and processed by Tor is further broken down into two categories: delivered payloads and overhead. DeliveredBytesRead and DeliveredBytesWritten are the total relay cell payloads transmitted since the last CIRC_BW event, not counting relay cell headers or circuit headers. OverheadBytesRead and OverheadBytesWritten are the extra unused bytes at the end of each cell in order for it to be the fixed CELL_LEN bytes long.

The sum of DeliveredBytesRead and OverheadBytesRead MUST be less than BytesRead, and the same is true for their written counterparts. This sum represents the total relay cell bytes on the circuit that have been validated by Tor, not counting relay headers and cell headers. Subtracting this sum (plus relay cell headers) from the BytesRead (or BytesWritten) value gives the byte count that Tor has decided to reject due to protocol errors, or has otherwise decided to ignore.

The Time field is provided only in versions 0.3.2.1-alpha and later. It records when Tor created the bandwidth event.

The SS, CWND, RTT, and MIN_RTT fields are present only if the circuit has negotiated congestion control to an onion service or Exit hop (any intermediate leaky pipe congestion control hops are not examined here). SS provides an indication if the circuit is in slow start (1), or not (0). CWND is the size of the congestion window in terms of number of cells. RTT is the N_EWMA smoothed current RTT value, and MIN_RTT is the minimum RTT value of the circuit. The SS and CWND fields apply only to the upstream direction of the circuit. The slow start state and CWND values of the other endpoint may be different.

These events are generated about once per second per circuit; no events are generated for circuits that had no attached stream writing or reading.

[First added in 0.2.5.2-alpha]

[DELIVERED_READ, OVERHEAD_READ, DELIVERED_WRITTEN, and OVERHEAD_WRITTEN were added in Tor 0.3.4.0-alpha]

[SS, CWND, RTT, and MIN_RTT were added in Tor 0.4.7.5-alpha]

Per-circuit cell stats

The syntax is:

"650" SP "CELL_STATS" [ SP "ID=" CircuitID ] [ SP "InboundQueue=" QueueID SP "InboundConn=" ConnID ] [ SP "InboundAdded=" CellsByType ] [ SP "InboundRemoved=" CellsByType SP "InboundTime=" MsecByType ] [ SP "OutboundQueue=" QueueID SP "OutboundConn=" ConnID ] [ SP "OutboundAdded=" CellsByType ] [ SP "OutboundRemoved=" CellsByType SP "OutboundTime=" MsecByType ] CRLF CellsByType, MsecByType = CellType ":" 1*DIGIT 0*( "," CellType ":" 1*DIGIT ) CellType = 1*( "a" - "z" / "0" - "9" / "_" ) Examples are: 650 CELL_STATS ID=14 OutboundQueue=19403 OutboundConn=15 OutboundAdded=create_fast:1,relay_early:2 OutboundRemoved=create_fast:1,relay_early:2 OutboundTime=create_fast:0,relay_early:0 650 CELL_STATS InboundQueue=19403 InboundConn=32 InboundAdded=relay:1,created_fast:1 InboundRemoved=relay:1,created_fast:1 InboundTime=relay:0,created_fast:0 OutboundQueue=6710 OutboundConn=18 OutboundAdded=create:1,relay_early:1 OutboundRemoved=create:1,relay_early:1 OutboundTime=create:0,relay_early:0

ID is the locally unique circuit identifier that is only included if the circuit originates at this node.

Inbound and outbound refer to the direction of cell flow through the circuit which is either to origin (inbound) or from origin (outbound).

InboundQueue and OutboundQueue are identifiers of the inbound and outbound circuit queues of this circuit. These identifiers are only unique per OR connection. OutboundQueue is chosen by this node and matches InboundQueue of the next node in the circuit.

InboundConn and OutboundConn are locally unique IDs of inbound and outbound OR connection. OutboundConn does not necessarily match InboundConn of the next node in the circuit.

InboundQueue and InboundConn are not present if the circuit originates at this node. OutboundQueue and OutboundConn are not present if the circuit (currently) ends at this node.

InboundAdded and OutboundAdded are total number of cells by cell type added to inbound and outbound queues. Only present if at least one cell was added to a queue.

InboundRemoved and OutboundRemoved are total number of cells by cell type processed from inbound and outbound queues. InboundTime and OutboundTime are total waiting times in milliseconds of all processed cells by cell type. Only present if at least one cell was removed from a queue.

These events are generated about once per second per circuit; no events are generated for circuits that have not added or processed any cell. These events are only generated if TestingTorNetwork is set.

[First added in 0.2.5.2-alpha]

Token buckets refilled

The syntax is:

"650" SP "TB_EMPTY" SP BucketName [ SP "ID=" ConnID ] SP "READ=" ReadBucketEmpty SP "WRITTEN=" WriteBucketEmpty SP "LAST=" LastRefill CRLF BucketName = "GLOBAL" / "RELAY" / "ORCONN" ReadBucketEmpty = 1*DIGIT WriteBucketEmpty = 1*DIGIT LastRefill = 1*DIGIT Examples are: 650 TB_EMPTY ORCONN ID=16 READ=0 WRITTEN=0 LAST=100 650 TB_EMPTY GLOBAL READ=93 WRITTEN=93 LAST=100 650 TB_EMPTY RELAY READ=93 WRITTEN=93 LAST=100

This event is generated when refilling a previously empty token bucket. BucketNames "GLOBAL" and "RELAY" keywords are used for the global or relay token buckets, BucketName "ORCONN" is used for the token buckets of an OR connection. Controllers MUST tolerate unrecognized bucket names.

ConnID is only included if the BucketName is "ORCONN".

If both global and relay buckets and/or the buckets of one or more OR connections run out of tokens at the same time, multiple separate events are generated.

ReadBucketEmpty (WriteBucketEmpty) is the time in millis that the read (write) bucket was empty since the last refill. LastRefill is the time in millis since the last refill.

If a bucket went negative and if refilling tokens didn't make it go positive again, there will be multiple consecutive TB_EMPTY events for each refill interval during which the bucket contained zero tokens or less. In such a case, ReadBucketEmpty or WriteBucketEmpty are capped at LastRefill in order not to report empty times more than once.

These events are only generated if TestingTorNetwork is set.

[First added in 0.2.5.2-alpha]

HiddenService descriptors

The syntax is:

"650" SP "HS_DESC" SP Action SP HSAddress SP AuthType SP HsDir [SP DescriptorID] [SP "REASON=" Reason] [SP "REPLICA=" Replica] [SP "HSDIR_INDEX=" HSDirIndex] Action = "REQUESTED" / "UPLOAD" / "RECEIVED" / "UPLOADED" / "IGNORE" / "FAILED" / "CREATED" HSAddress = 16*Base32Character / 56*Base32Character / "UNKNOWN" AuthType = "NO_AUTH" / "BASIC_AUTH" / "STEALTH_AUTH" / "UNKNOWN" HsDir = LongName / Fingerprint / "UNKNOWN" DescriptorID = 32*Base32Character / 43*Base64Character Reason = "BAD_DESC" / "QUERY_REJECTED" / "UPLOAD_REJECTED" / "NOT_FOUND" / "UNEXPECTED" / "QUERY_NO_HSDIR" / "QUERY_RATE_LIMITED" Replica = 1*DIGIT HSDirIndex = 64*HEXDIG These events will be triggered when required HiddenService descriptor is not found in the cache and a fetch or upload with the network is performed. If the fetch was triggered with only a DescriptorID (using the HSFETCH command for instance), the HSAddress only appears in the Action=RECEIVED since there is no way to know the HSAddress from the DescriptorID thus the value will be "UNKNOWN". If we already had the v0 descriptor, the newly fetched v2 descriptor will be ignored and a "HS_DESC" event with "IGNORE" action will be generated. For HsDir, LongName is always preferred. If HsDir cannot be found in node list at the time event is sent, Fingerprint will be used instead. If Action is "FAILED", Tor SHOULD send Reason field as well. Possible values of Reason are: - "BAD_DESC" - descriptor was retrieved, but found to be unparsable. - "QUERY_REJECTED" - query was rejected by HS directory. - "UPLOAD_REJECTED" - descriptor was rejected by HS directory. - "NOT_FOUND" - HS descriptor with given identifier was not found. - "UNEXPECTED" - nature of failure is unknown. - "QUERY_NO_HSDIR" - No suitable HSDir were found for the query. - "QUERY_RATE_LIMITED" - query for this service is rate-limited For "QUERY_NO_HSDIR" or "QUERY_RATE_LIMITED", the HsDir will be set to "UNKNOWN" which was introduced in tor 0.3.1.0-alpha and 0.4.1.0-alpha respectively. If Action is "CREATED", Tor SHOULD send Replica field as well. The Replica field contains the replica number of the generated descriptor. The Replica number is specified in rend-spec.txt section 1.3 and determines the descriptor ID of the descriptor. For hidden service v3, the following applies: The "HSDIR_INDEX=" is an optional field that is only for version 3 which contains the computed index of the HsDir the descriptor was uploaded to or fetched from. The "DescriptorID" key is the descriptor blinded key used for the index value at the "HsDir". The "REPLICA=" field is not used for the "CREATED" event because v3 doesn't use the replica number in the descriptor ID computation. Because client authentication is not yet implemented, the "AuthType" field is always "NO_AUTH". [HS v3 support added 0.3.3.1-alpha]

HiddenService descriptors content

The syntax is:

"650" "+" "HS_DESC_CONTENT" SP HSAddress SP DescId SP HsDir CRLF Descriptor CRLF "." CRLF "650" SP "OK" CRLF HSAddress = 16*Base32Character / 56*Base32Character / "UNKNOWN" DescId = 32*Base32Character / 32*Base64Character HsDir = LongName / "UNKNOWN" Descriptor = The text of the descriptor formatted as specified in rend-spec.txt section 1.3 (v2) or rend-spec-v3.txt section 2.4 (v3) or empty string on failure.

This event is triggered when a successfully fetched HS descriptor is received. The text of that descriptor is then replied. If the HS_DESC event is enabled, it is replied just after the RECEIVED action.

If a fetch fails, the Descriptor is an empty string and HSAddress is set to "UNKNOWN". The HS_DESC event should be used to get more information on the failed request.

If the fetch fails for the QUERY_NO_HSDIR or QUERY_RATE_LIMITED reason from the HS_DESC event, the HsDir is set to "UNKNOWN". This was introduced in 0.3.1.0-alpha and 0.4.1.0-alpha respectively.

It's expected to receive a reply relatively fast as in it's the time it takes to fetch something over the Tor network. This can be between a couple of seconds up to 60 seconds (not a hard limit). But, in any cases, this event will reply either the descriptor's content or an empty one.

[HS_DESC_CONTENT was added in Tor 0.2.7.1-alpha] [HS v3 support added 0.3.3.1-alpha]

Network liveness has changed

Syntax:

"650" SP "NETWORK_LIVENESS" SP Status CRLF Status = "UP" / ; The network now seems to be reachable. "DOWN" / ; The network now seems to be unreachable. Controllers MUST tolerate unrecognized status types. [NETWORK_LIVENESS was added in Tor 0.2.7.2-alpha]

Pluggable Transport Logs

Syntax:

"650" SP "PT_LOG" SP PT=Program SP Message

Program = The program path as defined in the *TransportPlugin configuration option. Tor accepts relative and full path. Message = The log message that the PT sends back to the tor parent process minus the "LOG" string prefix. Formatted as specified in pt-spec.txt section "3.3.4. Pluggable Transport Log Message". This event is triggered when tor receives a log message from the PT. Example: PT (obfs4): LOG SEVERITY=debug MESSAGE="Connected to bridge A" the resulting control port event would be: Tor: 650 PT_LOG PT=/usr/bin/obs4proxy SEVERITY=debug MESSAGE="Connected to bridge A" [PT_LOG was added in Tor 0.4.0.1-alpha]

Pluggable Transport Status

Syntax:

"650" SP "PT_STATUS" SP PT=Program SP TRANSPORT=Transport SP Message

Program = The program path as defined in the *TransportPlugin configuration option. Tor accepts relative and full path. Transport = This value indicates a hint on what the PT is such as the name or the protocol used for instance. Message = The status message that the PT sends back to the tor parent process minus the "STATUS" string prefix. Formatted as specified in pt-spec.txt section "3.3.5 Pluggable Transport Status Message". This event is triggered when tor receives a log message from the PT. Example: PT (obfs4): STATUS TRANSPORT=obfs4 CONNECT=Success the resulting control port event would be: Tor: 650 PT_STATUS PT=/usr/bin/obs4proxy TRANSPORT=obfs4 CONNECT=Success [PT_STATUS was added in Tor 0.4.0.1-alpha]

Implementation notes

Authentication

If the control port is open and no authentication operation is enabled, Tor trusts any local user that connects to the control port. This is generally a poor idea.

If the 'CookieAuthentication' option is true, Tor writes a "magic cookie" file named "control_auth_cookie" into its data directory (or to another file specified in the 'CookieAuthFile' option). To authenticate, the controller must demonstrate that it can read the contents of the cookie file:

Current versions of Tor support cookie authentication

using the "COOKIE" authentication method: the controller sends the contents of the cookie file, encoded in hexadecimal. This authentication method exposes the user running a controller to an unintended information disclosure attack whenever the controller has greater filesystem read access than the process that it has connected to. (Note that a controller may connect to a process other than Tor.) It is almost never safe to use, even if the controller's user has explicitly specified which filename to read an authentication cookie from. For this reason, the COOKIE authentication method has been deprecated and will be removed from Tor before some future version of Tor. * 0.2.2.x versions of Tor starting with 0.2.2.36, and all versions of Tor after 0.2.3.12-alpha, support cookie authentication using the "SAFECOOKIE" authentication method, which discloses much less information about the contents of the cookie file.

If the 'HashedControlPassword' option is set, it must contain the salted hash of a secret password. The salted hash is computed according to the S2K algorithm in RFC 2440 (OpenPGP), and prefixed with the s2k specifier. This is then encoded in hexadecimal, prefixed by the indicator sequence "16:". Thus, for example, the password 'foo' could encode to:

16:660537E3E1CD49996044A3BF558097A981F539FEA2F9DA662B4626C1C2 ++++++++++++++++**^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ salt hashed value indicator You can generate the salt of a password by calling 'tor --hash-password <password>'

or by using the example code in the Python and Java controller libraries. To authenticate under this scheme, the controller sends Tor the original secret that was used to generate the password, either as a quoted string or encoded in hexadecimal.

Don't let the buffer get too big.

With old versions of Tor (before 0.2.0.16-alpha), if you ask for lots of events, and 16MB of them queue up on the buffer, the Tor process will close the socket.

Newer Tor versions do not have this 16 MB buffer limit. However, if you leave huge numbers of events unread, Tor may still run out of memory, so you should still be careful about buffer size.

Backward compatibility with v0 control protocol.

The 'version 0' control protocol was replaced in Tor 0.1.1.x. Support was removed in Tor 0.2.0.x. Every non-obsolete version of Tor now supports the version 1 control protocol.

For backward compatibility with the "version 0" control protocol, Tor used to check whether the third octet of the first command is zero. (If it was, Tor assumed that version 0 is in use.)

This compatibility was removed in Tor 0.1.2.16 and 0.2.0.4-alpha.

Tor config options for use by controllers

Tor provides a few special configuration options for use by controllers. These options are not saved to disk by SAVECONF. Most can be set and examined by the SETCONF and GETCONF commands, but some (noted below) can only be given in a torrc file or on the command line.

Generally, these options make Tor unusable by disabling a portion of Tor's normal operations. Unless a controller provides replacement functionality to fill this gap, Tor will not correctly handle user requests.

__AllDirActionsPrivate

If true, Tor will try to launch all directory operations through anonymous connections. (Ordinarily, Tor only tries to anonymize requests related to hidden services.) This option will slow down directory access, and may stop Tor from working entirely if it does not yet have enough directory information to build circuits. (Boolean. Default: "0".) __DisablePredictedCircuits If true, Tor will not launch preemptive "general-purpose" circuits for streams to attach to. (It will still launch circuits for testing and for hidden services.) (Boolean. Default: "0".) __LeaveStreamsUnattached If true, Tor will not automatically attach new streams to circuits; instead, the controller must attach them with ATTACHSTREAM. If the controller does not attach the streams, their data will never be routed. (Boolean. Default: "0".) __HashedControlSessionPassword As HashedControlPassword, but is not saved to the torrc file by SAVECONF. Added in Tor 0.2.0.20-rc. __ReloadTorrcOnSIGHUP If this option is true (the default), we reload the torrc from disk every time we get a SIGHUP (from the controller or via a signal). Otherwise, we don't. This option exists so that controllers can keep their options from getting overwritten when a user sends Tor a HUP for some other reason (for example, to rotate the logs). (Boolean. Default: "1") __OwningControllerProcess If this option is set to a process ID, Tor will periodically check whether a process with the specified PID exists, and exit if one does not. Added in Tor 0.2.2.28-beta. This option's intended use is documented in section 3.23 with the related TAKEOWNERSHIP command. Note that this option can only specify a single process ID, unlike the TAKEOWNERSHIP command which can be sent along multiple control connections. (String. Default: unset.) __OwningControllerFD If this option is a valid socket, Tor will start with an open control connection on this socket. Added in Tor 0.3.3.1-alpha. This socket will be an owning controller, as if it had already called TAKEOWNERSHIP. It will be automatically authenticated. This option should only be used by other programs that are starting Tor. This option cannot be changed via SETCONF; it must be set in a torrc or via the command line. (Integer. Default: -1.) __DisableSignalHandlers If this option is set to true during startup, then Tor will not install any signal handlers to watch for POSIX signals. The SIGNAL controller command will still work. This option is meant for embedding Tor inside another process, when the controlling process would rather handle signals on its own. This option cannot be changed via SETCONF; it must be set in a torrc or via the command line. (Boolean. Default: 0.)

Phases from the Bootstrap status event.

[For the bootstrap phases reported by Tor prior to 0.4.0.x, see Section 5.6.]

This section describes the various bootstrap phases currently reported by Tor. Controllers should not assume that the percentages and tags listed here will continue to match up, or even that the tags will stay in the same order. Some phases might also be skipped (not reported) if the associated bootstrap step is already complete, or if the phase no longer is necessary. Only "starting" and "done" are guaranteed to exist in all future versions.

Current Tor versions enter these phases in order, monotonically. Future Tors MAY revisit earlier phases, for example, if the network fails.

Overview of Bootstrap reporting.

Bootstrap phases can be viewed as belonging to one of three stages:

Initial connection to a Tor relay or bridge

Obtaining directory information

Building an application circuit

Tor doesn't specifically enter Stage 1; that is a side effect of other actions that Tor is taking. Tor could be making a connection to a fallback directory server, or it could be making a connection to a guard candidate. Either one counts as Stage 1 for the purposes of bootstrap reporting.

Stage 2 might involve Tor contacting directory servers, or it might involve reading cached directory information from a previous session. Large parts of Stage 2 might be skipped if there is already enough cached directory information to build circuits. Tor will defer reporting progress in Stage 2 until Stage 1 is complete.

Tor defers this reporting because Tor can already have enough directory information to build circuits, yet not be able to connect to a relay. Without that deferral, a user might misleadingly see Tor stuck at a large amount of progress when something as fundamental as making a TCP connection to any relay is failing.

Tor also doesn't specifically enter Stage 3; that is a side effect of Tor building circuits for some purpose or other. In a typical client, Tor builds predicted circuits to provide lower latency for application connection requests. In Stage 3, Tor might make new connections to relays or bridges that it did not connect to in Stage 1.

Phases in Bootstrap Stage 1.

Phase 0: tag=starting summary="Starting"

Tor starts out in this phase.

Phase 1: tag=conn_pt summary="Connecting to pluggable transport" [This phase is new in 0.4.0.x]

Tor is making a TCP connection to the transport plugin for a pluggable transport. Tor will use this pluggable transport to make its first connection to a bridge.

Phase 2: tag=conn_done_pt summary="Connected to pluggable transport" [New in 0.4.0.x]

Tor has completed its TCP connection to the transport plugin for the pluggable transport.

Phase 3: tag=conn_proxy summary="Connecting to proxy" [New in 0.4.0.x]

Tor is making a TCP connection to a proxy to make its first connection to a relay or bridge.

Phase 4: tag=conn_done_proxy summary="Connected to proxy" [New in 0.4.0.x]

Tor has completed its TCP connection to a proxy to make its first connection to a relay or bridge.

Phase 5: tag=conn summary="Connecting to a relay" [New in 0.4.0.x; prior versions of Tor had a "conn_dir" phase that sometimes but not always corresponded to connecting to a directory server]

Tor is making its first connection to a relay. This might be through a pluggable transport or proxy connection that Tor has already established.

Phase 10: tag=conn_done summary="Connected to a relay" [New in 0.4.0.x] Tor has completed its first connection to a relay. Phase 14: tag=handshake summary="Handshaking with a relay" [New in 0.4.0.x; prior versions of Tor had a "handshake_dir" phase] Tor is in the process of doing a TLS handshake with a relay. Phase 15: tag=handshake_done summary="Handshake with a relay done" [New in 0.4.0.x] Tor has completed its TLS handshake with a relay.

Phases in Bootstrap Stage 2.

Phase 20: tag=onehop_create summary="Establishing an encrypted directory connection" [prior to 0.4.0.x, this was numbered 15]

Once TLS is finished with a relay, Tor will send a CREATE_FAST cell to establish a one-hop circuit for retrieving directory information. It will remain in this phase until it receives the CREATED_FAST cell back, indicating that the circuit is ready.

Phase 25: tag=requesting_status summary="Asking for networkstatus consensus" [prior to 0.4.0.x, this was numbered 20]

Once we've finished our one-hop circuit, we will start a new stream for fetching the networkstatus consensus. We'll stay in this phase until we get the 'connected' relay cell back, indicating that we've established a directory connection.

Phase 30: tag=loading_status summary="Loading networkstatus consensus" [prior to 0.4.0.x, this was numbered 25]

Once we've established a directory connection, we will start fetching the networkstatus consensus document. This could take a while; this phase is a good opportunity for using the "progress" keyword to indicate partial progress.

This phase could stall if the directory server we picked doesn't have a copy of the networkstatus consensus so we have to ask another, or it does give us a copy but we don't find it valid.

Phase 40: tag=loading_keys summary="Loading authority key certs"

Sometimes when we've finished loading the networkstatus consensus, we find that we don't have all the authority key certificates for the keys that signed the consensus. At that point we put the consensus we fetched on hold and fetch the keys so we can verify the signatures.

Phase 45 tag=requesting_descriptors summary="Asking for relay descriptors"

Once we have a valid networkstatus consensus and we've checked all its signatures, we start asking for relay descriptors. We stay in this phase until we have received a 'connected' relay cell in response to a request for descriptors.

[Some versions of Tor (starting with 0.2.6.2-alpha but before 0.4.0.x): Tor could report having internal paths only; see Section 5.6]

Phase 50: tag=loading_descriptors summary="Loading relay descriptors"

We will ask for relay descriptors from several different locations, so this step will probably make up the bulk of the bootstrapping, especially for users with slow connections. We stay in this phase until we have descriptors for a significant fraction of the usable relays listed in the networkstatus consensus (this can be between 25% and 95% depending on Tor's configuration and network consensus parameters). This phase is also a good opportunity to use the "progress" keyword to indicate partial steps.

[Some versions of Tor (starting with 0.2.6.2-alpha but before 0.4.0.x): Tor could report having internal paths only; see Section 5.6] Phase 75: tag=enough_dirinfo summary="Loaded enough directory info to build circuits" [New in 0.4.0.x; previously, Tor would misleadingly report the "conn_or" tag once it had enough directory info.]

Phases in Bootstrap Stage 3.

Phase 76: tag=ap_conn_pt summary="Connecting to pluggable transport to build circuits" [New in 0.4.0.x]

This is similar to conn_pt, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 77: tag=ap_conn_done_pt summary="Connected to pluggable transport to build circuits" [New in 0.4.0.x]

This is similar to conn_done_pt, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 78: tag=ap_conn_proxy summary="Connecting to proxy to build circuits" [New in 0.4.0.x]

This is similar to conn_proxy, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 79: tag=ap_conn_done_proxy summary="Connected to proxy to build circuits" [New in 0.4.0.x]

This is similar to conn_done_proxy, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 80: tag=ap_conn summary="Connecting to a relay to build circuits" [New in 0.4.0.x]

This is similar to conn, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 85: tag=ap_conn_done summary="Connected to a relay to build circuits" [New in 0.4.0.x]

This is similar to conn_done, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 89: tag=ap_handshake summary="Finishing handshake with a relay to build circuits" [New in 0.4.0.x]

This is similar to handshake, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 90: tag=ap_handshake_done summary="Handshake finished with a relay to build circuits" [New in 0.4.0.x]

This is similar to handshake_done, except for making connections to additional relays or bridges that Tor needs to use to build application circuits.

Phase 95: tag=circuit_create summary="Establishing a[n internal] Tor circuit" [prior to 0.4.0.x, this was numbered 90]

Once we've finished our TLS handshake with the first hop of a circuit, we will set about trying to make some 3-hop circuits in case we need them soon.

[Some versions of Tor (starting with 0.2.6.2-alpha but before 0.4.0.x): Tor could report having internal paths only; see Section 5.6]

Phase 100: tag=done summary="Done"

A full 3-hop circuit has been established. Tor is ready to handle application connections now.

[Some versions of Tor (starting with 0.2.6.2-alpha but before 0.4.0.x): Tor could report having internal paths only; see Section 5.6]

Bootstrap phases reported by older versions of Tor

These phases were reported by Tor older than 0.4.0.x. For newer versions of Tor, see Section 5.5.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. When bootstrap completes, Tor will be ready to handle an application requesting an exit circuit to services like the World Wide Web.

If the consensus does not contain Exits, Tor will only build internal circuits. In this case, earlier statuses will have included "internal" as indicated above. When bootstrap completes, Tor will be ready to handle an application requesting an internal circuit to hidden services at ".onion" addresses.

If a future consensus contains Exits, exit circuits may become available.]

Phase 0: tag=starting summary="Starting"

Tor starts out in this phase.

Phase 5: tag=conn_dir summary="Connecting to directory server"

Tor sends this event as soon as Tor has chosen a directory server -- e.g. one of the authorities if bootstrapping for the first time or after a long downtime, or one of the relays listed in its cached directory information otherwise.

Tor will stay at this phase until it has successfully established a TCP connection with some directory server. Problems in this phase generally happen because Tor doesn't have a network connection, or because the local firewall is dropping SYN packets.

Phase 10: tag=handshake_dir summary="Finishing handshake with directory server"

This event occurs when Tor establishes a TCP connection with a relay or authority used as a directory server (or its https proxy if it's using one). Tor remains in this phase until the TLS handshake with the relay or authority is finished.

Problems in this phase generally happen because Tor's firewall is doing more sophisticated MITM attacks on it, or doing packet-level keyword recognition of Tor's handshake.

Phase 15: tag=onehop_create summary="Establishing an encrypted directory connection"

Once TLS is finished with a relay, Tor will send a CREATE_FAST cell to establish a one-hop circuit for retrieving directory information. It will remain in this phase until it receives the CREATED_FAST cell back, indicating that the circuit is ready.

Phase 20: tag=requesting_status summary="Asking for networkstatus consensus"

Once we've finished our one-hop circuit, we will start a new stream for fetching the networkstatus consensus. We'll stay in this phase until we get the 'connected' relay cell back, indicating that we've established a directory connection.

Phase 25: tag=loading_status summary="Loading networkstatus consensus"

Once we've established a directory connection, we will start fetching the networkstatus consensus document. This could take a while; this phase is a good opportunity for using the "progress" keyword to indicate partial progress.

This phase could stall if the directory server we picked doesn't have a copy of the networkstatus consensus so we have to ask another, or it does give us a copy but we don't find it valid.

Phase 40: tag=loading_keys summary="Loading authority key certs"

Sometimes when we've finished loading the networkstatus consensus, we find that we don't have all the authority key certificates for the keys that signed the consensus. At that point we put the consensus we fetched on hold and fetch the keys so we can verify the signatures.

Phase 45 tag=requesting_descriptors summary="Asking for relay descriptors [ for internal paths]"

Once we have a valid networkstatus consensus and we've checked all its signatures, we start asking for relay descriptors. We stay in this phase until we have received a 'connected' relay cell in response to a request for descriptors.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will ask for descriptors for both exit and internal paths. If not, Tor will only ask for descriptors for internal paths. In this case, this status will include "internal" as indicated above.] Phase 50: tag=loading_descriptors summary="Loading relay descriptors[ for internal paths]"

We will ask for relay descriptors from several different locations, so this step will probably make up the bulk of the bootstrapping, especially for users with slow connections. We stay in this phase until we have descriptors for a significant fraction of the usable relays listed in the networkstatus consensus (this can be between 25% and 95% depending on Tor's configuration and network consensus parameters). This phase is also a good opportunity to use the "progress" keyword to indicate partial steps.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will download descriptors for both exit and internal paths. If not, Tor will only download descriptors for internal paths. In this case, this status will include "internal" as indicated above.]

Phase 80: tag=conn_or summary="Connecting to the Tor network[ internally]"

Once we have a valid consensus and enough relay descriptors, we choose entry guard(s) and start trying to build some circuits. This step is similar to the "conn_dir" phase above; the only difference is the context.

If a Tor starts with enough recent cached directory information, its first bootstrap status event will be for the conn_or phase.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. If not, Tor will only build internal circuits. In this case, this status will include "internal(ly)" as indicated above.] Phase 85: tag=handshake_or summary="Finishing handshake with first hop[ of internal circuit]"

This phase is similar to the "handshake_dir" phase, but it gets reached if we finish a TCP connection to a Tor relay and we have already reached the "conn_or" phase. We'll stay in this phase until we complete a TLS handshake with a Tor relay.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor may be finishing a handshake with the first hop if either an exit or internal circuit. In this case, it won't specify which type. If the consensus contains no Exits, Tor will only build internal circuits. In this case, this status will include "internal" as indicated above.]

Phase 90: tag=circuit_create summary="Establishing a[n internal] Tor circuit"

Once we've finished our TLS handshake with the first hop of a circuit, we will set about trying to make some 3-hop circuits in case we need them soon.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. If not, Tor will only build internal circuits. In this case, this status will include "internal" as indicated above.]

Phase 100: tag=done summary="Done"

A full 3-hop circuit has been established. Tor is ready to handle application connections now.

[Newer versions of Tor (0.2.6.2-alpha and later): If the consensus contains Exits (the typical case), Tor will build both exit and internal circuits. At this stage, Tor will be ready to handle an application requesting an exit circuit to services like the World Wide Web.

If the consensus does not contain Exits, Tor will only build internal circuits. In this case, earlier statuses will have included "internal" as indicated above. At this stage, Tor will be ready to handle an application requesting an internal circuit to hidden services at ".onion" addresses.

If a future consensus contains Exits, exit circuits may become available.]

HOW TOR VERSION NUMBERS WORK

Table of Contents

1. The Old Way 2. The New Way 3. Version status.

The Old Way

Before 0.1.0, versions were of the format:

MAJOR.MINOR.MICRO(status(PATCHLEVEL))?(-cvs)?

where MAJOR, MINOR, MICRO, and PATCHLEVEL are numbers, status is one of "pre" (for an alpha release), "rc" (for a release candidate), or "." for a release. As a special case, "a.b.c" was equivalent to "a.b.c.0". We compare the elements in order (major, minor, micro, status, patchlevel, cvs), with "cvs" preceding non-cvs.

We would start each development branch with a final version in mind: say, "0.0.8". Our first pre-release would be "0.0.8pre1", followed by (for example) "0.0.8pre2-cvs", "0.0.8pre2", "0.0.8pre3-cvs", "0.0.8rc1", "0.0.8rc2-cvs", and "0.0.8rc2". Finally, we'd release 0.0.8. The stable CVS branch would then be versioned "0.0.8.1-cvs", and any eventual bugfix release would be "0.0.8.1".

The New Way

Starting at 0.1.0.1-rc, versions are of the format:

MAJOR.MINOR.MICRO[.PATCHLEVEL][-STATUS_TAG][ (EXTRA_INFO)]*

The stuff in parentheses is optional. As before, MAJOR, MINOR, MICRO, and PATCHLEVEL are numbers, with an absent number equivalent to 0. All versions should be distinguishable purely by those four numbers.

The STATUS_TAG is purely informational, and lets you know how stable we think the release is: "alpha" is pretty unstable; "rc" is a release candidate; and no tag at all means that we have a final release. If the tag ends with "-cvs" or "-dev", you're looking at a development snapshot that came after a given release. If we do encounter two versions that differ only by status tag, we compare them lexically. The STATUS_TAG can't contain whitespace.

The EXTRA_INFO is also purely informational, often containing information about the SCM commit this version came from. It is surrounded by parentheses and can't contain whitespace. Unlike the STATUS_TAG this never impacts the way that versions should be compared. EXTRA_INFO may appear any number of times. Tools should generally not parse EXTRA_INFO entries.

Now, we start each development branch with (say) 0.1.1.1-alpha. The patchlevel increments consistently as the status tag changes, for example, as in: 0.1.1.2-alpha, 0.1.1.3-alpha, 0.1.1.4-rc, 0.1.1.5-rc. Eventually, we release 0.1.1.6. The next patch release is 0.1.1.7.

Between these releases, CVS is versioned with a -cvs tag: after 0.1.1.1-alpha comes 0.1.1.1-alpha-cvs, and so on. But starting with 0.1.2.1-alpha-dev, we switched to SVN and started using the "-dev" suffix instead of the "-cvs" suffix.

Version status.

Sometimes we need to determine whether a Tor version is obsolete, experimental, or neither, based on a list of recommended versions. The logic is as follows: * If a version is listed on the recommended list, then it is "recommended". * If a version is newer than every recommended version, that version is "experimental" or "new". * If a version is older than every recommended version, it is "obsolete" or "old". * The first three components (major,minor,micro) of a version number are its "release series". If a version has other recommended versions with the same release series, and the version is newer than all such recommended versions, but it is not newer than _every_ recommended version, then the version is "new in series". * Finally, if none of the above conditions hold, then the version is "un-recommended."

Tor Bandwidth File Format juga teor Table of Contents 1. Scope and preliminaries 1.2. Acknowledgements 1.3. Outline 1.4. Format Versions 2. Format details 2.1. Definitions 2.2. Header List format 2.3. Relay Line format 2.4. Implementation details 2.4.1. Writing bandwidth files atomically 2.4.2. Additional KeyValue pair definitions 2.4.2.1. Simple Bandwidth Scanner 2.4.2.2. Torflow A. Sample data A.1. Generated by Torflow A.2. Generated by sbws version 0.1.0 A.3. Generated by sbws version 1.0.3 A.4. Headers generated by sbws version 1.0.4 A.5 Generated by sbws version 1.1.0 B. Scaling bandwidths B.1. Scaling requirements B.2. A linear scaling method B.3. Quota changes B.4. Torflow aggregation

Scope and preliminaries

This document describes the format of Tor's Bandwidth File, version 1.0.0 and later.

It is a new specification for the existing bandwidth file format, which we call version 1.0.0. It also specifies new format versions 1.1.0 and later, which are backwards compatible with 1.0.0 parsers.

Since Tor version 0.2.4.12-alpha, the directory authorities use the Bandwidth File file called "V3BandwidthsFile" generated by Torflow [1]. The details of this format are described in Torflow's README.spec.txt. We also summarise the format in this specification.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Acknowledgements

The original bandwidth generator (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth generator implementation.

This specification was revised after feedback from:

Nick Mathewson (nickm) Iain Learmonth (irl)

Outline

The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1 and 3.4.2, use the term bandwidth measurements, to refer to what here is called Bandwidth File.

A Bandwidth File contains information on relays' bandwidth capacities and is produced by bandwidth generators, previously known as bandwidth scanners.

Format Versions

1.0.0 - The legacy Bandwidth File format

1.1.0 - Adds a header containing information about the bandwidth file. Document the sbws and Torflow relay line keys. 1.2.0 - If there are not enough eligible relays, the bandwidth file SHOULD contain a header, but no relays. (To match Torflow's existing behaviour.) Adds scanner and destination countries to the header. Adds new KeyValue Lines to the Header List section with statistics about the number of relays included in the file. Adds new KeyValues to Relay Bandwidth Lines, with different bandwidth values (averages and descriptor bandwidths). 1.4.0 - Adds monitoring KeyValues to the header and relay lines. RelayLines for excluded relays MAY be present in the bandwidth file for diagnostic reasons. Similarly, if there are not enough eligible relays, the bandwidth file MAY contain all known relays. Diagnostic relay lines SHOULD be marked with vote=0, and Tor SHOULD NOT use their bandwidths in its votes. Also adds Tor version. 1.5.0 - Removes "recent_measurement_attempt_count" KeyValue. 1.6.0 - Adds congestion control stream events KeyValues. 1.7.0 - Adds ratios KeyValues to the relay lines and network averages KeyValues to the header. All Tor versions can consume format version 1.0.0.

All Tor versions can consume format version 1.1.0 and later, but Tor versions earlier than 0.3.5.1-alpha warn if the header contains any KeyValue lines after the Timestamp.

Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not understand "vote=0". Instead, they will vote for the actual bandwidths that sbws puts in diagnostic relay lines: * 1 for relays with "unmeasured=1", and * the relay's measured and scaled bandwidth when "under_min_report=1".

Format details

The Bandwidth File MUST contain the following sections: - Header List (exactly once), which is a partially ordered list of - Header Lines (one or more times), then - Relay Lines (zero or more times), in an arbitrary order. If it does not contain these sections, parsers SHOULD ignore the file.

Definitions

The following nonterminals are defined in Tor directory protocol sections 1.2., 2.1.1., 2.1.3.:

bool Int SP (space) NL (newline) KeywordChar ArgumentChar nickname hexdigest (a '$', followed by 40 hexadecimal characters ([A-Fa-f0-9])) Nonterminal defined section 2 of version-spec.txt [4]: version_number We define the following nonterminals: Line ::= ArgumentChar* NL RelayLine ::= KeyValue (SP KeyValue)* NL HeaderLine ::= KeyValue NL KeyValue ::= Key "=" Value Key ::= (KeywordChar | "_")+ Value ::= ArgumentCharValue+ ArgumentCharValue ::= any printing ASCII character except NL and SP. Terminator ::= "=====" or "====" Generators SHOULD use a 5-character terminator. Timestamp ::= Int Bandwidth ::= Int MasterKey ::= a base64-encoded Ed25519 public key, with padding characters omitted. DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601 CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country (eg the destination is in a Content Delivery Network). CountryCodeList ::= One or more CountryCode(s) separated by a comma ([A-Z]{2}(,[A-Z]{2})*).

Note that key_value and value are defined in Tor directory protocol with different formats to KeyValue and Value here.

Tor versions earlier than 0.3.5.1-alpha require all lines in the file to be 510 characters or less. The previous limit was 254 characters in Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines.

Note that directory authorities are only supported on the two most recent stable Tor versions, so we expect that line limits will be removed after Tor 0.4.0 is released in 2019.

Header List format

It consists of a Timestamp line and zero or more HeaderLines.

All the header lines MUST conform to the HeaderLine format, except the first Timestamp line.

The Timestamp line is not a HeaderLine to keep compatibility with the legacy Bandwidth File format.

Some header Lines MUST appear in specific positions, as documented below. All other Lines can appear in any order.

If a parser does not recognize any extra material in a header Line, the Line MUST be ignored.

If a header Line does not conform to this format, the Line SHOULD be ignored by parsers.

It consists of:

Timestamp NL

[At start, exactly once.]

The Unix Epoch time in seconds of the most recent generator bandwidth result.

If the generator implementation has multiple threads or subprocesses which can fail independently, it SHOULD take the most recent timestamp from each thread and use the oldest value. This ensures all the threads continue running.

If there are threads that do not run continuously, they SHOULD be excluded from the timestamp calculation.

If there are no recent results, the generator MUST NOT generate a new file.

It does not follow the KeyValue format for backwards compatibility with version 1.0.0.

"version" version_number NL

[In second position, zero or one time.]

The specification document format version. It uses semantic versioning [5].

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the version_number is considered to be "1.0.0".

"software" Value NL

[Zero or one time.]

The name of the software that created the document.

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the software is considered to be "torflow".

"software_version" Value NL

[Zero or one time.]

The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme.

This Line was added in version 1.1.0 of this specification.

"file_created" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the file was created.

This Line was added in version 1.1.0 of this specification.

"generator_started" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the generator started.

This Line was added in version 1.1.0 of this specification.

"earliest_bandwidth" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone when the first relay bandwidth was obtained.

This Line was added in version 1.1.0 of this specification.

"latest_bandwidth" DateTime NL

[Zero or one time.]

The date and time timestamp in ISO 8601 format and UTC time zone of the most recent generator bandwidth result.

This time MUST be identical to the initial Timestamp line.

This duplicate value is included to make the format easier for people to read.

This Line was added in version 1.1.0 of this specification.

"number_eligible_relays" Int NL

[Zero or one time.]

The number of relays that have enough measurements to be included in the bandwidth file.

This Line was added in version 1.2.0 of this specification.

"minimum_percent_eligible_relays" Int NL

[Zero or one time.]

The percentage of relays in the consensus that SHOULD be included in every generated bandwidth file.

If this threshold is not reached, format versions 1.3.0 and earlier SHOULD NOT contain any relays. (Bandwidth files always include a header.)

Format versions 1.4.0 and later SHOULD include all the relays for diagnostic purposes, even if this threshold is not reached. But these relays SHOULD be marked so that Tor does not vote on them. See section 1.4 for details.

The minimum percentage is 60% in Torflow, so sbws uses 60% as the default.

This Line was added in version 1.2.0 of this specification.

"number_consensus_relays" Int NL

[Zero or one time.]

The number of relays in the consensus.

This Line was added in version 1.2.0 of this specification.

"percent_eligible_relays" Int NL

[Zero or one time.]

The number of eligible relays, as a percentage of the number of relays in the consensus.

This line SHOULD be equal to: (number_eligible_relays * 100.0) / number_consensus_relays to the number of relays in the consensus to include in this file. This Line was added in version 1.2.0 of this specification. "minimum_number_eligible_relays" Int NL [Zero or one time.]

The minimum number of relays that SHOULD be included in the bandwidth file. See minimum_percent_eligible_relays for details.

This line SHOULD be equal to: number_consensus_relays * (minimum_percent_eligible_relays / 100.0) This Line was added in version 1.2.0 of this specification. "scanner_country" CountryCode NL [Zero or one time.] The country, as in political geolocation, where the generator is run. This Line was added in version 1.2.0 of this specification. "destinations_countries" CountryCodeList NL [Zero or one time.]

The country, as in political geolocation, or countries where the destination Web server(s) are located. The destination Web Servers serve the data that the generator retrieves to measure the bandwidth.

This Line was added in version 1.2.0 of this specification.

"recent_consensus_count" Int NL

[Zero or one time.].

The number of the different consensuses seen in the last data_period days. (data_period is 5 by default.)

Assuming that Tor clients fetch a consensus every 1-2 hours, and that the data_period is 5 days, the Value of this Key SHOULD be between: data_period * 24 / 2 = 60 data_period * 24 = 120 This Line was added in version 1.4.0 of this specification. "recent_priority_list_count" Int NL [Zero or one time.]

The number of times that a list with a subset of relays prioritized to be measured has been created in the last data_period days. (data_period is 5 by default.)

In 2019, with 7000 relays in the network, the Value of this Key SHOULD be approximately: data_period * 24 / 1.5 = 80 Being 1.5 the approximate number of hours it takes to measure a priority list of 7000 * 0.05 (350) relays, when the fraction of relays in a priority list is the 5% (0.05). This Line was added in version 1.4.0 of this specification. "recent_priority_relay_count" Int NL [Zero or one time.]

The number of relays that has been in in the list of relays prioritized to be measured in the last data_period days. (data_period is 5 by default.)

In 2019, with 7000 relays in the network, the Value of this Key SHOULD be approximately: 80 * (7000 * 0.05) = 28000 Being 0.05 (5%) the fraction of relays in a priority list and 80 the approximate number of priority lists (see "recent_priority_list_count"). This Line was added in version 1.4.0 of this specification. "recent_measurement_attempt_count" Int NL [Zero or one time.]

The number of times that any relay has been queued to be measured in the last data_period days. (data_period is 5 by default.)

In 2019, with 7000 relays in the network, the Value of this Key SHOULD be approximately the same as "recent_priority_relay_count", assuming that there is one attempt to measure a relay for each relay that has been prioritized unless there are system, network or implementation issues.

This Line was added in version 1.4.0 of this specification and removed in version 1.5.0.

"recent_measurement_failure_count" Int NL

[Zero or one time.]

The number of times that the scanner attempted to measure a relay in the last data_period days (5 by default), but the relay has not been measured because of system, network or implementation issues.

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_error_count" Int NL

[Zero or one time.]

The number of relays that have no successful measurements in the last data_period days (5 by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_near_count" Int NL

[Zero or one time.]

The number of relays that have some successful measurements in the last data_period days (5 by default), but all those measurements were performed in a period of time that was too short (by default 1 day).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_old_count" Int NL

[Zero or one time.]

The number of relays that have some successful measurements, but all those measurements are too old (more than 5 days, by default).

Excludes relays that are already counted in recent_measurements_excluded_near_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_few_count" Int NL

[Zero or one time.]

The number of relays that don't have enough recent successful measurements. (Fewer than 2 measurements in the last 5 days, by default).

Excludes relays that are already counted in recent_measurements_excluded_near_count and recent_measurements_excluded_old_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"time_to_report_half_network" Int NL

[Zero or one time.]

The time in seconds that it would take to report measurements about the half of the network, given the number of eligible relays and the time it took in the last days (5 days, by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"tor_version" version_number NL

[Zero or one time.]

The Tor version of the Tor process controlled by the generator.

This Line was added in version 1.4.0 of this specification.

"mu" Int NL

[Zero or one time.]

The network stream bandwidth average calculated as explained in B4.2.

This Line was added in version 1.7.0 of this specification.

"muf" Int NL

[Zero or one time.]

The network stream bandwidth average filtered calculated as explained in B4.2.

This Line was added in version 1.7.0 of this specification.

KeyValue NL

[Zero or more times.]

There MUST NOT be multiple KeyValue header Lines with the same key. If there are, the parser SHOULD choose an arbitrary Line.

If a parser does not recognize a Keyword in a KeyValue Line, it MUST be ignored.

Future format versions may include additional KeyValue header Lines. Additional header Lines will be accompanied by a minor version increment.

Implementations MAY add additional header Lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys.

Parsers MUST NOT rely on the order of these additional Lines.

Additional header Lines MUST NOT use any keywords specified in the relay measurements format. If there are, the parser MAY ignore conflicting keywords.

Terminator NL

[Zero or one time.]

The Header List section ends with a Terminator.

In version 1.0.0, Header List ends when the first relay bandwidth is found conforming to the next section.

Implementations of version 1.1.0 and later SHOULD use a 5-character terminator.

Tor 0.4.0.1-alpha and later look for a 5-character terminator, or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2 used a 4-character terminator, this bug was fixed in 1.0.3.

Relay Line format

It consists of zero or more RelayLines containing relay ids and bandwidths. The relays and their KeyValues are in arbitrary order.

There MUST NOT be multiple KeyValue pairs with the same key in the same RelayLine. If there are, the parser SHOULD choose an arbitrary Value.

There MUST NOT be multiple RelayLines per relay identity (node_id or master_key_ed25519). If there are, parsers SHOULD issue a warning. Parers MAY reject the file, choose an arbitrary RelayLine, or ignore both RelayLines.

If a parser does not recognize any extra material in a RelayLine, the extra material MUST be ignored.

Each RelayLine includes the following KeyValue pairs:

"node_id" hexdigest

[Exactly once.]

The fingerprint for the relay's RSA identity key.

Note: In bandwidth files read by Tor versions earlier than 0.3.4.1-alpha, node_id MUST NOT be at the end of the Line. These authority versions are no longer supported.

Current Tor versions ignore master_key_ed25519, so node_id MUST be present in each relay Line.

Implementations of version 1.1.0 and later SHOULD include both node_id and master_key_ed25519. Parsers SHOULD accept Lines that contain at least one of them.

"master_key_ed25519" MasterKey

[Zero or one time.]

The relays's master Ed25519 key, base64 encoded, without trailing "="s, to avoid ambiguity with KeyValue "=" character.

This KeyValue pair SHOULD be present, see the note under node_id.

This KeyValue was added in version 1.1.0 of this specification.

"bw" Bandwidth

[Exactly once.]

The bandwidth of this relay in kilobytes per second.

No Zero Bandwidths: Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth. If there are zero bandwidths, the parser MAY ignore them.

Bandwidth Aggregation: Multiple measurements can be aggregated using an averaging scheme, such as a mean, median, or decaying average.

Bandwidth Scaling: Torflow scales bandwidths to kilobytes per second. Other implementations SHOULD use kilobytes per second for their initial bandwidth scaling.

If different implementations or configurations are used in votes for the same network, their measurements MAY need further scaling. See Appendix B for information about scaling, and one possible scaling method.

MaxAdvertisedBandwidth: Bandwidth generators MUST limit the relays' measured bandwidth based on the MaxAdvertisedBadwidth. A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth, BandwidthRate, RelayBandwidthRate, BandwidthBurst, and RelayBandwidthBurst. Therefore, generators MUST limit a relay's measured bandwidth to its descriptor's bandwidth-avg. This limit needs to be implemented in the generator, because generators may scale consensus weights before sending them to Tor. Generators SHOULD NOT limit measured bandwidths based on descriptors' bandwidth-observed, because that penalises new relays.

sbws limits the relay's measured bandwidth to the bandwidth-avg advertised.

Torflow partitions relays based on their bandwidth. For unmeasured relays, Torflow uses the minimum of all descriptor bandwidths, including bandwidth-avg (MaxAdvertisedBandwidth) and bandwidth-observed. Then Torflow measures the relays in each partition against each other, which implicitly limits a relay's measured bandwidth to the bandwidths of similar relays.

Torflow also generates consensus weights based on the ratio between the measured bandwidth and the minimum of all descriptor bandwidths (at the time of the measurement). So when an operator reduces the MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's measured bandwidth.

KeyValue

[Zero or more times.]

Future format versions may include additional KeyValue pairs on a RelayLine. Additional KeyValue pairs will be accompanied by a minor version increment.

Implementations MAY add additional relay KeyValue pairs as needed. This specification SHOULD be updated to avoid conflicting meanings for the same Keywords.

Parsers MUST NOT rely on the order of these additional KeyValue pairs.

Additional KeyValue pairs MUST NOT use any keywords specified in the header format. If there are, the parser MAY ignore conflicting keywords.

Implementation details

Writing bandwidth files atomically

To avoid inconsistent reads, implementations SHOULD write bandwidth files atomically. If the file is transferred from another host, it SHOULD be written to a temporary path, then renamed to the V3BandwidthsFile path.

sbws versions 0.7.0 and later write the bandwidth file to an archival location, create a temporary symlink to that location, then atomically rename the symlink to the configured V3BandwidthsFile path.

Torflow does not write bandwidth files atomically.

Additional KeyValue pair definitions

KeyValue pairs in RelayLines that current implementations generate.

Simple Bandwidth Scanner

sbws RelayLines contain these keys:

"node_id" hexdigest

As above.

"bw" Bandwidth

As above.

"nick" nickname

[Exactly once.]

The relay nickname.

Torflow also has a "nick" KeyValue.

"rtt" Int

[Zero or one time.]

The Round Trip Time in milliseconds to obtain 1 byte of data.

This KeyValue was added in version 1.1.0 of this specification. It became optional in version 1.3.0 or 1.4.0 of this specification.

"time" DateTime

[Exactly once.]

The date and time timestamp in ISO 8601 format and UTC time zone when the last bandwidth was obtained.

This KeyValue was added in version 1.1.0 of this specification. The Torflow equivalent is "measured_at".

"success" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay were successful.

This KeyValue was added in version 1.1.0 of this specification.

"error_circ" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay failed because of circuit failures.

This KeyValue was added in version 1.1.0 of this specification. The Torflow equivalent is "circ_fail".

"error_stream" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay failed because of stream failures.

This KeyValue was added in version 1.1.0 of this specification.

"error_destination" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay failed because the destination Web server was not available.

This KeyValue was added in version 1.4.0 of this specification.

"error_second_relay" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay failed because sbws could not find a second relay for the test circuit.

This KeyValue was added in version 1.4.0 of this specification.

"error_misc" Int

[Zero or one time.]

The number of times that the bandwidth measurements for this relay failed because of other reasons.

This KeyValue was added in version 1.1.0 of this specification.

"bw_mean" Int

[Zero or one time.]

The measured bandwidth mean for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"bw_median" Int

[Zero or one time.]

The measured bandwidth median for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"desc_bw_avg" Int

[Zero or one time.]

The descriptor average bandwidth for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"desc_bw_obs_last" Int

[Zero or one time.]

The last descriptor observed bandwidth for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"desc_bw_obs_mean" Int

[Zero or one time.]

The descriptor observed bandwidth mean for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"desc_bw_bur" Int

[Zero or one time.]

The descriptor burst bandwidth for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"consensus_bandwidth" Int

[Zero or one time.]

The consensus bandwidth for this relay in bytes per second.

This KeyValue was added in version 1.2.0 of this specification.

"consensus_bandwidth_is_unmeasured" Bool

[Zero or one time.]

If the consensus bandwidth for this relay was not obtained from three or more bandwidth authorities, this KeyValue is True or False otherwise.

This KeyValue was added in version 1.2.0 of this specification.

"relay_in_recent_consensus_count" Int

[Zero or one time.]

The number of times this relay was found in a consensus in the last data_period days. (Unless otherwise stated, data_period is 5 by default.)

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_priority_list_count" Int

[Zero or one time.]

The number of times this relay has been prioritized to be measured in the last data_period days.

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurement_attempt_count" Int

[Zero or one time.]

The number of times this relay was tried to be measured in the last data_period days.

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurement_failure_count" Int

[Zero or one time.]

The number of times this relay was tried to be measured in the last data_period days, but it was not possible to obtain a measurement.

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurements_excluded_error_count" Int

[Zero or one time.]

The number of recent relay measurement attempts that failed. Measurements are recent if they are in the last data_period days (5 by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurements_excluded_near_count" Int

[Zero or one time.]

When all of a relay's recent successful measurements were performed in a period of time that was too short (by default 1 day), the relay is excluded. This KeyValue contains the number of recent successful measurements for the relay that were ignored for this reason.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurements_excluded_old_count" Int

[Zero or one time.]

The number of successful measurements for this relay that are too old (more than data_period days, 5 by default).

Excludes measurements that are already counted in relay_recent_measurements_excluded_near_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This KeyValue was added in version 1.4.0 of this specification.

"relay_recent_measurements_excluded_few_count" Int

[Zero or one time.]

The number of successful measurements for this relay that were ignored because the relay did not have enough successful measurements (fewer than 2, by default).

Excludes measurements that are already counted in relay_recent_measurements_excluded_near_count or relay_recent_measurements_excluded_old_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This KeyValue was added in version 1.4.0 of this specification.

"under_min_report" bool

[Zero or one time.]

If the value is 1, there are not enough eligible relays in the bandwidth file, and Tor bandwidth authorities MAY NOT vote on this relay. (Current Tor versions do not change their behaviour based on the "under_min_report" key.)

If the value is 0 or the KeyValue is not present, there are enough relays in the bandwidth file.

Because Tor versions released before April 2019 (see section 1.4. for the full list of versions) ignore "vote=0", generator implementations MUST NOT change the bandwidths for under_min_report relays. Using the same bw value makes authorities that do not understand "vote=0" or "under_min_report=1" produce votes that don't change relay weights too much. It also avoids flapping when the reporting threshold is reached.

This KeyValue was added in version 1.4.0 of this specification.

"unmeasured" bool

[Zero or one time.]

If the value is 1, this relay was not successfully measured and Tor bandwidth authorities MAY NOT vote on this relay. (Current Tor versions do not change their behaviour based on the "unmeasured" key.)

If the value is 0 or the KeyValue is not present, this relay was successfully measured.

Because Tor versions released before April 2019 (see section 1.4. for the full list of versions) ignore "vote=0", generator implementations MUST set "bw=1" for unmeasured relays. Using the minimum bw value makes authorities that do not understand "vote=0" or "unmeasured=1" produce votes that don't change relay weights too much.

This KeyValue was added in version 1.4.0 of this specification.

"vote" bool

[Zero or one time.]

If the value is 0, Tor directory authorities SHOULD ignore the relay's entry in the bandwidth file. They SHOULD vote for the relay the same way they would vote for a relay that is not present in the file.

This MAY be the case when this relay was not successfully measured but it is included in the Bandwidth File, to diagnose why they were not measured.

If the value is 1 or the KeyValue is not present, Tor directory authorities MUST use the relay's bw value in any votes for that relay.

Implementations MUST also set "bw=1" for unmeasured relays. But they MUST NOT change the bw for under_min_report relays. (See the explanations under "unmeasured" and "under_min_report" for more details.)

This KeyValue was added in version 1.4.0 of this specification.

"xoff_recv" Int

[Zero or one time.]

The number of times this relay received XOFF_RECV stream events while being measured in the last data_period days.

This KeyValue was added in version 1.6.0 of this specification.

"xoff_sent" Int

[Zero or one time.]

The number of times this relay received XOFF_SENT stream events while being measured in the last data_period days.

This KeyValue was added in version 1.6.0 of this specification.

"r_strm" Float

[Zero or one time.]

The stream ratio of this relay calculated as explained in B4.3.

This KeyValue was added in version 1.7.0 of this specification.

"r_strm_filt" Float

[Zero or one time.]

The filtered stream ratio of this relay calculated as explained in B4.3.

This KeyValue was added in version 1.7.0 of this specification.

Torflow

Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].

References:

1. https://gitweb.torproject.org/torflow.git 2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332 The Torflow specification is outdated, and does not match the current implementation. See section A.1. for the format produced by Torflow. 3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt 4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt 5. https://semver.org/

Sample data

The following has not been obtained from any real measurement.

Generated by Torflow

This an example version 1.0.0 document:

1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath

Generated by sbws version 0.1.0

1523911758 version=1.1.0 software=sbws software_version=0.1.0 latest_bandwidth=2018-04-16T20:49:18 file_created=2018-04-16T21:49:18 generator_started=2018-04-16T15:13:25 earliest_bandwidth=2018-04-16T15:13:26

bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36

Generated by sbws version 1.0.3

1523911758 version=1.2.0 latest_bandwidth=2018-04-16T20:49:18 file_created=2018-04-16T21:49:18 generator_started=2018-04-16T15:13:25 earliest_bandwidth=2018-04-16T15:13:26 minimum_number_eligible_relays=3862 minimum_percent_eligible_relays=60 number_consensus_relays=6436 number_eligible_relays=6000 percent_eligible_relays=93 software=sbws software_version=1.0.3

bw=38000 bw_mean=1127824 bw_median=1180062 desc_bw_avg=1073741824 desc_bw_obs_last=17230879 desc_bw_obs_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 bw=1 bw_mean=199162 bw_median=185675 desc_bw_avg=409600 desc_bw_obs_last=836165 desc_bw_obs_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36

When there are not enough eligible measured relays:

1540496079 version=1.2.0 earliest_bandwidth=2018-10-20T19:35:52 file_created=2018-10-25T19:35:03 generator_started=2018-10-25T11:42:56 latest_bandwidth=2018-10-25T19:34:39 minimum_number_eligible_relays=3862 minimum_percent_eligible_relays=60 number_consensus_relays=6436 number_eligible_relays=2960 percent_eligible_relays=46 software=sbws software_version=1.0.3

Headers generated by sbws version 1.0.4

1523911758 version=1.2.0 latest_bandwidth=2018-04-16T20:49:18 destinations_countries=TH,ZZ file_created=2018-04-16T21:49:18 generator_started=2018-04-16T15:13:25 earliest_bandwidth=2018-04-16T15:13:26 minimum_number_eligible_relays=3862 minimum_percent_eligible_relays=60 number_consensus_relays=6436 number_eligible_relays=6000 percent_eligible_relays=93 scanner_country=SN software=sbws software_version=1.0.4

Generated by sbws version 1.1.0

1523911758 version=1.4.0 latest_bandwidth=2018-04-16T20:49:18 destinations_countries=TH,ZZ file_created=2018-04-16T21:49:18 generator_started=2018-04-16T15:13:25 earliest_bandwidth=2018-04-16T15:13:26 minimum_number_eligible_relays=3862 minimum_percent_eligible_relays=60 number_consensus_relays=6436 number_eligible_relays=6000 percent_eligible_relays=93 recent_measurement_attempt_count=6243 recent_measurement_failure_count=732 recent_measurements_excluded_error_count=969 recent_measurements_excluded_few_count=3946 recent_measurements_excluded_near_count=90 recent_measurements_excluded_old_count=0 recent_priority_list_count=20 recent_priority_relay_count=6243 scanner_country=SN software=sbws software_version=1.1.0 time_to_report_half_network=57273

bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0 bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0

Scaling bandwidths

Scaling requirements

Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, scaling methods SHOULD perform the following checks: * If the total bandwidth is zero, all relays should be given equal bandwidths. * If the scaled bandwidth is zero, it should be rounded up to one.

Initial experiments indicate that scaling may not be needed for torflow and sbws, because their measured bandwidths are similar enough already.

A linear scaling method

If scaling is required, here is a simple linear bandwidth scaling method, which ensures that all bandwidth votes contain approximately the same total bandwidth:

1. Calculate the relay quota by dividing the total measured bandwidth in all votes, by the number of relays with measured bandwidth votes. In the public tor network, this is approximately 7500 as of April 2018. The quota should be a consensus parameter, so it can be adjusted for all generators on the network. 2. Calculate a vote quota by multiplying the relay quota by the number of relays this bandwidth authority has measured bandwidths for. 3. Calculate a scaling factor by dividing the vote quota by the total unscaled measured bandwidth in this bandwidth authority's upcoming vote. 4. Multiply each unscaled measured bandwidth by the scaling factor.

Now, the total scaled bandwidth in the upcoming vote is approximately equal to the quota.

Quota changes

If all generators are using scaling, the quota can be gradually reduced or increased as needed. Smaller quotas decrease the size of uncompressed consensuses, and may decrease the size of consensus diffs and compressed consensuses. But if the relay quota is too small, some relays may be over- or under-weighted.

Torflow aggregation

Torflow implements two methods to compute the bandwidth values from the (stream) bandwidth measurements: with and without PID control feedback. The method described here is without PID control (see Torflow specification, section 2.2).

In the following sections, the relays' measured bandwidth refer to the ones that this bandwidth authority has measured for the relays that would be included in the next bandwidth authority's upcoming vote.

1. Calculate the filtered bandwidth for each relay: - choose the relay's measurements (`bw_j`) that are equal or greater than the mean of the measurements for this relay - calculate the mean of those measurements In pseudocode: bw_filt_i = mean(max(mean(bw_j), bw_j)) 2. Calculate network averages: - calculate the filtered average by dividing the sum of all the relays' filtered bandwidth by the number of relays that have been measured (`n`), ie, calculate the mean average of the relays' filtered bandwidth. - calculate the stream average by dividing the sum of all the relays' measured bandwidth by the number of relays that have been measured (`n`), ie, calculate the mean average or the relays' measured bandwidth. In pseudocode: bw_avg_filt_ = bw_filt_i / n bw_avg_strm = bw_i / n 3. Calculate ratios for each relay: - calculate the filtered ratio by dividing each relay filtered bandwidth by the filtered average - calculate the stream ratio by dividing each relay measured bandwidth by the stream average In pseudocode:

r_filt_i = bw_filt_i / bw_avg_filt r_strm_i = bw_i / bw_avg_strm

4. Calculate the final ratio for each relay: The final ratio is the larger between the filtered bandwidth's and the stream bandwidth's ratio. In pseudocode: r_i = max(r_filt_i, r_strm_i) 5. Calculate the scaled bandwidth for each relay: The most recent descriptor observed bandwidth (`bw_obs_i`) is multiplied by the ratio In pseudocode: bw_new_i = r_i * bw_obs_i

<<In this way, the resulting network status consensus bandwidth values are effectively re-weighted proportional to how much faster the node was as compared to the rest of the network.>>

Tor Directory List Format Tim Wilson-Brown (teor) Table of Contents 1. Scope and Preliminaries 1.1. Format Overview 1.2. Acknowledgements 1.3. Format Versions 1.4. Future Plans 2. Format Details 2.1. Nonterminals 2.2. List Header 2.2.1. List Header Format 2.3. List Generation 2.3.1. List Generation Format 2.4. Directory Entry 2.4.1. Directory Entry Format 3. Usage Considerations 3.1. Caching 3.2. Retrieving Directory Information 3.3. Fallback Reliability A.1. Sample Data A.1.1. Sample Fallback List Header A.1.2. Sample Fallback List Generation A.1.3. Sample Fallback Entries

Scope and Preliminaries

This document describes the format of Tor's directory lists, which are compiled and hard-coded into the tor binary. There is currently one list: the fallback directory mirrors. This list is also parsed by other libraries, like stem and metrics-lib. Alternate Tor implementations can use this list to bootstrap from the latest public Tor directory information.

The FallbackDir feature was introduced by proposal 210, and was first supported by Tor in Tor version 0.2.4.7-alpha. The first hard-coded list was shipped in 0.2.8.1-alpha.

The hard-coded fallback directory list is located in the tor source repository at:

src/app/config/fallback_dirs.inc

In Tor 0.3.4 and earlier, the list is located at:

src/or/fallback_dirs.inc

This document describes version 2.0.0 and later of the directory list format.

Legacy, semi-structured versions of the fallback list were released with Tor 0.2.8.1-alpha through Tor 0.3.1.9. We call this format version 1. Stem and Relay Search have parsers for this legacy format.

Format Overview

A directory list is a C code fragment containing an array of C string constants. Each double-quoted C string constant is a valid torrc FallbackDir entry. Each entry contains various data fields.

Directory lists do not include the C array's declaration, or the array's terminating NULL. Entries in directory lists do not include the FallbackDir torrc option. These are handled by the including C code.

Directory lists also include C-style comments and whitespace. The presence of whitespace may be significant, but the amount of whitespace is never significant. The type of whitespace is not significant to the C compiler or Tor C string parser. However, other parsers MAY rely on the distinction between newlines and spaces. (And that the only whitespace characters in the list are newlines and spaces.)

The directory entry C string constants are split over multiple lines for readability. Structured C-style comments are used to provide additional data fields. This information is not used by Tor, but may be of interest to other libraries.

The order of directory entries and data fields is not significant, except where noted below.

Acknowledgements

The original fallback directory script and format was created by weasel. The current script uses code written by gsathya & karsten.

This specification was revised after feedback from:

Damian Johnson ("atagar") Iain R. Learmonth ("irl")

Format Versions

The directory list format uses semantic versioning: https://semver.org

In particular: * major versions are used for incompatible changes, like removing non-optional fields * minor versions are used for compatible changes, like adding fields * patch versions are for bug fixes, like fixing an incorrectly-formatted Summary item 1.0.0 - The legacy fallback directory list format 2.0.0 - Adds name and extrainfo structured comments, and section separator comments to make the list easier to parses. Also adds a source list comment to the header. 3.0.0 - Modifies the format of the source list comment.

Future Plans

Tor also has an auth_dirs.inc file, but it is not yet in this format. Tor uses slightly different formats for authorities and fallback directory mirrors, so we will need to make some changes to tor so that it parses this format. (We will also need to add authority-specific information to this format.) See #24818 for details.

We want to add a torrc option so operators can opt-in their relays as fallback directory mirrors. This gives us a signed opt-in confirmation. (We can also continue to accept whitelist entries, and do other checks.) We need to write a short proposal, and make some changes to tor and the fallback update script. See #24839 for details.

Format Details

Directory lists contain the following sections:

- List Header (exactly once) - List Generation (exactly once, may be empty) - Directory Entry (zero or more times) Each section (or entry) ends with a separator.

Nonterminals

The following nonterminals are defined in the Onionoo details document specification:

dir_address fingerprint nickname See https://metrics.torproject.org/onionoo.html#details

The following nonterminals are defined in the "Tor directory protocol" specification in dir-spec.txt:

Keyword ArgumentChar NL (newline) SP (space) bool (must not be confused with Onionoo's JSON "boolean") We derive the following nonterminals from Onionoo and dir-spec.txt: ipv4_or_port ::= port from an IPv4 or_addresses item The ipv4_or_port is the port part of an IPv4 address from the Onionoo or_addresses list. ipv6_or_address ::= an IPv6 or_addresses item The ipv6_or_address is an IPv6 address and port from the Onionoo or_addresses list. The address MAY be in the canonical RFC 5952 IPv6 address format. A key-value pair: value ::= Zero or more ArgumentChar, excluding the following strings: * a double quotation mark (DQUOTE), and * the C comment terminators ("/*" and "*/"). Note that the C++ comment ("//") and equals sign ("=") are not excluded, because they are reserved for future use in base64 values. key_value ::= Keyword "=" value We also define these additional nonterminals: number ::= An optional negative sign ("-"), followed by one or more numeric characters ([0-9]), with an optional decimal part (".", followed by one or more numeric characters). separator ::= "/*" SP+ "=====" SP+ "*/"

List Header

The list header consists of a number of key-value pairs, embedded in C-style comments.

List Header Format

"/" SP+ "type=" Keyword SP+ "/" SP* NL

[At start, exactly once.]

The type of directory entries in the list. Parsers SHOULD exit with an error if this is not the first line of the list, or if the value is anything other than "fallback". "/*" SP+ "version=" version_number SP+ "*/" SP* NL [In second position, exactly once.] The version of the directory list format. version_number is a semantic version, see the "Format Versions" section for details. Version 1.0.0 represents the undocumented, legacy fallback list format(s). Version 2.0.0 and later are documented by this specification. "/*" SP+ "timestamp=" number SP+ "*/" SP* NL [Exactly once.] A positive integer that indicates when this directory list was generated. This timestamp is guaranteed to increase for every version 2.0.0 and later directory list. The current timestamp format is YYYYMMDDHHMMSS, as an integer. "/*" SP+ "source=" Keyword ("," Keyword)* SP+ "*/" SP* NL [Zero or one time.] A list of the sources of the directory entries in the list. As of version 3.0.0, the possible sources are: * "offer-list" - the fallback_offer_list file in the fallback-scripts repository. * "descriptor" - one or more signed descriptors, each containing an "offer-fallback-dir" line. This feature will be implemented in ticket #24839. * "fallback" - a fallback_dirs.inc file from a tor repository. Used in check_existing mode. Before #24839 is implemented, the default is "offer-list". During the transition to signed offers, it will be "descriptor,offer-list". Afterwards, it will be "descriptor". In version 2.0.0, only one source name was allowed after "source=", and the deprecated "whitelist" source name was used instead of "offer-list". This line was added in version 2.0.0 of this specification. The format of this line was modified in version 3.0.0 of this specification. "/*" SP+ key_value SP+ "*/" SP* NL [Zero or more times.] Future releases may include additional header fields. Parsers MUST NOT rely on the order of these additional fields. Additional header fields will be accompanied by a minor version increment. separator SP* NL The list header ends with the section separator.

List Generation

The list generation information consists of human-readable prose describing the content and origin of this directory list. It is contained in zero or more C-style comments, and may contain multi-line comments and uncommented C code.

In particular, this section may contain C-style comments that contain an equals ("=") character. It may also be entirely empty.

Future releases may arbitrarily change the content of this section. Parsers MUST NOT rely on a version increment when the format changes.

List Generation Format

In general, parsers MUST NOT rely on the format of this section.

Parsers MAY rely on the following details:

The list generation section MUST NOT be a valid directory entry.

The list generation summary MUST end with a section separator:

separator SP* NL

There MUST NOT be any section separators in the list generation section, other than the terminating section separator.

Directory Entry

A directory entry consists of a C string constant, and one or more C-style comments. The C string constant is a valid argument to the DirAuthority or FallbackDir torrc option. The section also contains additional key-value fields in C-style comments.

The list of fallback entries does not include the directory authorities: they are in a separate list. (The Tor implementation combines these lists after parsing them, and applies the DirAuthorityFallbackRate to their weights.)

Directory Entry Format

If a directory entry does not conform to this format, the entry SHOULD be ignored by parsers. DQUOTE dir_address SP+ "orport=" ipv4_or_port SP+ "id=" fingerprint DQUOTE SP* NL [At start, exactly once, on a single line.] This line consists of the following fields: dir_address An IPv4 address and DirPort for this directory, as defined by Onionoo. In this format version, all IPv4 addresses and DirPorts are guaranteed to be non-zero. (For IPv4 addresses, this means that they are not equal to "0.0.0.0".) ipv4_or_port An IPv4 ORPort for this directory, derived from Onionoo. In this format version, all IPv4 ORPorts are guaranteed to be non-zero. fingerprint The relay fingerprint of this directory, as defined by Onionoo. All relay fingerprints are guaranteed to have one or more non-zero digits. Note: Each double-quoted C string line that occurs after the first line, starts with space inside the quotes. This is a requirement of the Tor implementation. DQUOTE SP+ "ipv6=" ipv6_or_address DQUOTE SP* NL [Zero or one time.] The IPv6 address and ORPort for this directory, as defined by Onionoo. If present, IPv6 addresses and ORPorts are guaranteed to be non-zero. (For IPv6 addresses, this means that they are not equal to "[::]".) DQUOTE SP+ "weight=" number DQUOTE SP* NL [Zero or one time.] A non-negative, real-numbered weight for this directory. The default fallback weight is 1.0, and the default DirAuthorityFallbackRate is 1.0 in legacy Tor versions, and 0.1 in recent Tor versions. weight was removed in version 2.0.0, but is documented because it may be of interest to libraries implementing Tor's fallback behaviour. DQUOTE SP+ key_value DQUOTE SP* NL [Zero or more times.] Future releases may include additional data fields in double-quoted C string constants. Parsers MUST NOT rely on the order of these additional fields. Additional data fields will be accompanied by a minor version increment. "/*" SP+ "nickname=" nickname* SP+ "*/" SP* NL [Exactly once.] The nickname for this directory, as defined by Onionoo. An empty nickname indicates that the nickname is unknown. The first fallback list in the 2.0.0 format had nickname lines, but they were all empty. "/*" SP+ "extrainfo=" bool SP+ "*/" SP* NL [Exactly once.] An integer flag that indicates whether this directory caches extra-info documents. Set to 1 if the directory claimed that it cached extra-info documents in its descriptor when the list was created. 0 indicates that it did not, or its descriptor was not available. The first fallback list in the 2.0.0 format had extrainfo lines, but they were all zero. "/*" SP+ key_value SP+ "*/" SP* NL [Zero or more times.] Future releases may include additional data fields in C-style comments. Parsers MUST NOT rely on the order of these additional fields. Additional data fields will be accompanied by a minor version increment. separator SP* NL [Exactly once.] Each directory entry ends with the section separator. "," SP* NL [Exactly once.] The comma terminates the C string constant. (Multiple C string constants separated by whitespace or comments are coalesced by the C compiler.)

Usage Considerations

This section contains recommended library behaviours. It does not affect the format of directory lists.

Caching

The fallback list typically changes once every 6-12 months. The data in the list represents the state of the fallback directory entries when the list was created. Fallbacks can and do change their details over time.

Libraries SHOULD parse and cache the most recent version of these lists during their build or release processes. Libraries MUST NOT retrieve the lists by default every time they are deployed or executed.

The latest fallback list can be retrieved from:

https://gitweb.torproject.org/tor.git/plain/src/or/fallback_dirs.inc

Libraries MUST NOT rely on the availability of the server that hosts these lists.

The list can also be retrieved using:

git clone https://git.torproject.org/tor.git

If you just want the latest list, you may wish to perform a shallow clone.

Retrieving Directory Information

Some libraries retrieve directory documents directly from the Tor Directory Authorities. The directory authorities are designed to support Tor relay and client bootstrap, and MAY choose to rate-limit library access. Libraries MAY provide a user-agent in their requests, if they are not intended to support anonymous operation. (User agents are a fingerprinting vector.)

Libraries SHOULD consider the potential load on the authorities, and whether other sources can meet their needs.

Libraries that require high-uptime availability of Tor directory information should investigate the following options:

* OnionOO: https://metrics.torproject.org/onionoo.html * Third-party OnionOO mirrors are also available * CollecTor: https://collector.torproject.org/ * Fallback Directory Mirrors

Onionoo and CollecTor are typically updated every hour on a regular schedule. Fallbacks update their own directory information at random intervals, see dir-spec for details.

Fallback Reliability

The fallback list is typically regenerated when the fallback failure rate exceeds 25%. Libraries SHOULD NOT rely on any particular fallback being available, or some proportion of fallbacks being available.

Libraries that use fallbacks MAY wish to query an authority after a few fallback queries fail. For example, Tor clients try 3-4 fallbacks before trying an authority.

Sample Data

A sample version 2.0.0 fallback list is available here:

https://trac.torproject.org/projects/tor/raw-attachment/ticket/22759/fallback_dirs_new_format_version.4.inc

A sample transitional version 2.0.0 fallback list is available here:

https://raw.githubusercontent.com/teor2345/tor/fallback-format-2-v4/src/or/fallback_dirs.inc

Sample Fallback List Header

/* type=fallback / / version=2.0.0 / / ===== */

Sample Fallback List Generation

/* Whitelist & blacklist excluded 1326 of 1513 candidates. / / Checked IPv4 DirPorts served a consensus within 15.0s. / / Final Count: 151 (Eligible 187, Target 392 (1963 * 0.20), Max 200) Excluded: 36 (Same Operator 27, Failed/Skipped Download 9, Excess 0) Bandwidth Range: 1.3 - 40.0 MByte/s / / Onionoo Source: details Date: 2017-05-16 07:00:00 Version: 4.0 URL: https:onionoo.torproject.orgdetails?fields=fingerprint%2Cnickname%2Ccontact%2Clast_changed_address_or_port%2Cconsensus_weight%2Cadvertised_bandwidth%2Cor_addresses%2Cdir_address%2Crecommended_version%2Cflags%2Ceffective_family%2Cplatform&flag=V2Dir&type=relay&last_seen_days=-0&first_seen_days=30- / / Onionoo Source: uptime Date: 2017-05-16 07:00:00 Version: 4.0 URL: https:onionoo.torproject.orguptime?first_seen_days=30-&flag=V2Dir&type=relay&last_seen_days=-0 / / ===== */

Sample Fallback Entries

"176.10.104.240:80 orport=443 id=0111BA9B604669E636FFD5B503F382A4B7AD6E80" /* nickname=foo / / extrainfo=1 / / ===== / , "5.9.110.236:9030 orport=9001 id=0756B7CD4DFC8182BE23143FAC0642F515182CEB" " ipv6=[2a01:4f8:162:51e2::2]:9001" / nickname= / / extrainfo=0 / / ===== */ ,

Tor network parameters

This file lists the recognized parameters that can appear on the "params" line of a directory consensus.

Table of Contents

Network protocol parameters

Performance-tuning parameters

Voting-related parameters

Circuit-build-timeout parameters

Directory-related parameters

Pathbias parameters

Relay behavior

V3 onion service parameters

Denial-of-service parameters

Padding-related parameters

Guard-related parameters X. Obsolete parameters

Network protocol parameters

"circwindow" -- the default package window that circuits should be established with. It started out at 1000 cells, but some research indicates that a lower value would mean fewer cells in transit in the network at any given time. Min: 100, Max: 1000, Default: 1000 First-appeared: Tor 0.2.1.20

"UseOptimisticData" -- If set to zero, clients by default shouldn't try to send optimistic data to servers until they have received a RELAY_CONNECTED cell. Min: 0, Max: 1, Default: 1 First-appeared: 0.2.3.3-alpha Default was 0 before: 0.2.9.1-alpha Removed in 0.4.5.1-alpha; now always on.

"usecreatefast" -- Used to control whether clients use the CREATE_FAST handshake on the first hop of their circuits. Min: 0, Max: 1. Default: 1. First-appeared: 0.2.4.23, 0.2.5.2-alpha Removed in 0.4.5.1-alpha; now always off.

"min_paths_for_circs_pct" -- A percentage threshold that determines whether clients believe they have enough directory information to build circuits. This value applies to the total fraction of bandwidth-weighted paths that the client could build; see path-spec.txt for more information. Min: 25, Max: 95, Default: 60 First-appeared: 0.2.4

"ExtendByEd25519ID" -- If true, clients should include Ed25519 identities for relays when generating EXTEND2 cells. Min: 0. Max: 1. Default: 0. First-appeared: 0.3.0

"sendme_emit_min_version" -- Minimum SENDME version that can be sent. Min: 0. Max: 255. Default 0. First appeared: 0.4.1.1-alpha.

"sendme_accept_min_version" -- Minimum SENDME version that is accepted. Min: 0. Max: 255. Default 0. First appeared: 0.4.1.1-alpha.

"allow-network-reentry" -- If true, the Exit relays allow connections that are exiting the network to re-enter. If false, any exit connections going to a relay ORPort or an authority ORPort and DirPort is denied and the stream is terminated. Min: 0. Max: 1. Default: 0 First appeared: 0.4.5.1-alpha.

Performance-tuning parameters

"CircuitPriorityHalflifeMsec" -- the halflife parameter used when weighting which circuit will send the next cell. Obeyed by Tor 0.2.2.10-alpha and later. (Versions of Tor between 0.2.2.7-alpha and 0.2.2.10-alpha recognized a "CircPriorityHalflifeMsec" parameter, but mishandled it badly.) Min: 1, Max: 2147483647 (INT32_MAX), Default: 30000. First-appeared: Tor 0.2.2.11-alpha

"perconnbwrate" and "perconnbwburst" -- if set, each relay sets up a separate token bucket for every client OR connection, and rate limits that connection independently. Typically left unset, except when used for performance experiments around trac entry 1750. Only honored by relays running Tor 0.2.2.16-alpha and later. (Note that relays running 0.2.2.7-alpha through 0.2.2.14-alpha looked for bwconnrate and bwconnburst, but then did the wrong thing with them; see bug 1830 for details.) Min: 1, Max: 2147483647 (INT32_MAX), Default: (user setting of BandwidthRate/BandwidthBurst). First-appeared: 0.2.2.7-alpha Removed-in: 0.2.2.16-alpha

"NumNTorsPerTAP" -- When balancing ntor and TAP cells at relays, how many ntor handshakes should we perform for each TAP handshake? Min: 1. Max: 100000. Default: 10. First-appeared: 0.2.4.17-rc

"circ_max_cell_queue_size" -- This parameter determines the maximum number of cells allowed per circuit queue. Min: 1000. Max: 2147483647 (INT32_MAX). Default: 50000. First-appeared: 0.3.3.6-rc.

"KISTSchedRunInterval" -- How frequently should the "KIST" scheduler run in order to decide which data to write to the network? Value in units of milliseconds. Min: 2. Max: 100. Default: 2 First appeared: 0.3.2

"KISTSchedRunIntervalClient" -- How frequently should the "KIST" scheduler run in order to decide which data to write to the network, on clients? Value in units of milliseconds. The client value needs to be much lower than the relay value. Min: 2. Max: 100. Default: 2. First appeared: 0.4.8.2

Voting-related parameters

"bwweightscale" -- Value that bandwidth-weights are divided by. If not present then this defaults to 10000. Min: 1 First-appeared: 0.2.2.10-alpha

"maxunmeasuredbw" -- Used by authorities during voting with method 17 or later. The maximum value to give for any Bandwidth= entry for a router that isn't based on at least three measurements. First-appeared: 0.2.4.11-alpha

"FastFlagMinThreshold", "FastFlagMaxThreshold" -- lowest and highest allowable values for the cutoff for routers that should get the Fast flag. This is used during voting to prevent the threshold for getting the Fast flag from being too low or too high. FastFlagMinThreshold: Min: 4. Max: INT32_MAX: Default: 4. FastFlagMaxThreshold: Min: -. Max: INT32_MAX: Default: INT32_MAX First-appeared: 0.2.3.11-alpha

"AuthDirNumSRVAgreements" -- Minimum number of agreeing directory authority votes required for a fresh shared random value to be written in the consensus (this rule only applies on the first commit round of the shared randomness protocol). Min: 1. Max: INT32_MAX. Default: 2/3 of the total number of dirauth.

Circuit-build-timeout parameters

"cbtdisabled", "cbtnummodes", "cbtrecentcount", "cbtmaxtimeouts", "cbtmincircs", "cbtquantile", "cbtclosequantile", "cbttestfreq", "cbtmintimeout", "cbtlearntimeout", "cbtmaxopencircs", and "cbtinitialtimeout" -- see "2.4.5. Consensus parameters governing behavior" in path-spec.txt for a series of circuit build time related consensus parameters.

Directory-related parameters

"max-consensus-age-to-cache-for-diff" -- Determines how much consensus history (in hours) relays should try to cache in order to serve diffs. (min 0, max 8192, default 72)

"try-diff-for-consensus-newer-than" -- This parameter determines how old a consensus can be (in hours) before a client should no longer try to find a diff for it. (min 0, max 8192, default 72)

Pathbias parameters

"pb_mincircs", "pb_noticepct", "pb_warnpct", "pb_extremepct", "pb_dropguards", "pb_scalecircs", "pb_scalefactor", "pb_multfactor", "pb_minuse", "pb_noticeusepct", "pb_extremeusepct", "pb_scaleuse" -- DOCDOC

Relay behavior

"refuseunknownexits" -- if set to one, exit relays look at the previous hop of circuits that ask to open an exit stream, and refuse to exit if they don't recognize it as a relay. The goal is to make it harder for people to use them as one-hop proxies. See trac entry 1751 for details. Min: 0, Max: 1 First-appeared: 0.2.2.17-alpha

"onion-key-rotation-days" -- (min 1, max 90, default 28)

"onion-key-grace-period-days" -- (min 1, max onion-key-rotation-days, default 7)

Every relay should list each onion key it generates for onion-key-rotation-days days after generating it, and then replace it. Relays should continue to accept their most recent previous onion key for an additional onion-key-grace-period-days days after it is replaced. (Introduced in 0.3.1.1-alpha; prior versions of tor hardcoded both of these values to 7 days.)

"AllowNonearlyExtend" -- If true, permit EXTEND cells that are not inside RELAY_EARLY cells. Min: 0. Max: 1. Default: 0. First-appeared: 0.2.3.11-alpha

"overload_dns_timeout_scale_percent" -- This value is a percentage of how many DNS timeout over N seconds we accept before reporting the overload general state. It is scaled by a factor of 1000 in order to be able to represent decimal point. As an example, a value of 1000 means 1%. Min: 0. Max: 100000. Default: 1000. First-appeared: 0.4.6.8 Deprecated: 0.4.7.3-alpha-dev

"overload_dns_timeout_period_secs" -- This value is the period in seconds of the DNS timeout measurements (the N in the "overload_dns_timeout_scale_percent" parameter). For this amount of seconds, we will gather DNS statistics and at the end, we'll do an assessment on the overload general signal with regards to DNS timeouts. Min: 0. Max: 2147483647. Default: 600 First-appeared: 0.4.6.8 Deprecated: 0.4.7.3-alpha-dev

"overload_onionskin_ntor_scale_percent" -- This value is a percentage of how many onionskin ntor drop over N seconds we accept before reporting the overload general state. It is scaled by a factor of 1000 in order to be able to represent decimal point. As an example, a value of 1000 means 1%. Min: 0. Max: 100000. Default: 1000. First-appeared: 0.4.7.5-alpha

"overload_onionskin_ntor_period_secs" -- This value is the period in seconds of the onionskin ntor overload measurements (the N in the "overload_onionskin_ntor_scale_percent" parameter). For this amount of seconds, we will gather onionskin ntor statistics and at the end, we'll do an assessment on the overload general signal. Min: 0. Max: 2147483647. Default: 21600 (6 hours) First-appeared: 0.4.7.5-alpha

"assume-reachable" -- If true, relays should publish descriptors even when they cannot make a connection to their IPv4 ORPort. Min: 0. Max: 1. Default: 0. First appeared: 0.4.5.1-alpha.

"assume-reachable-ipv6" -- If true, relays should publish descriptors even when they cannot make a connection to their IPv6 ORPort. Min: 0. Max: 1. Default: 0. First appeared: 0.4.5.1-alpha.

"exit_dns_timeout" -- The time in milliseconds an Exit sets libevent to wait before it considers the DNS timed out. The corresponding libevent option is "timeout:". Min: 1. Max: 120000. Default: 1000 (1sec) First appeared: 0.4.7.5-alpha.

"exit_dns_num_attempts" -- How many attempts after the first should an Exit should try a timing-out DNS query before calling it hopeless? (Each of these attempts will wait for "exit_dns_timeout" independently). The corresponding libevent option is "attempts:". Min: 0. Max: 255. Default: 2 First appeared: 0.4.7.5-alpha.

V3 onion service parameters

"hs_intro_min_introduce2", "hs_intro_max_introduce2" -- Minimum/maximum amount of INTRODUCE2 cells allowed per circuits before rotation (actual amount picked at random between these two values). Min: 0. Max: INT32_MAX. Defaults: 16384, 32768.

"hs_intro_min_lifetime", "hs_intro_max_lifetime" -- Minimum/maximum lifetime in seconds that a service should keep an intro point for (actual lifetime picked at random between these two values). Min: 0. Max: INT32_MAX. Defaults: 18 hours, 24 hours.

"hs_intro_num_extra" -- Number of extra intro points a service is allowed to open. This concept comes from proposal #155. Min: 0. Max: 128. Default: 2.

"hsdir_interval" -- The length of a time period, in minutes. See rend-spec-v3.txt section [TIME-PERIODS]. Min: 30. Max: 14400. Default: 1440.

"hsdir_n_replicas" -- Number of HS descriptor replicas. Min: 1. Max: 16. Default: 2.

"hsdir_spread_fetch" -- Total number of HSDirs per replica a tor client should select to try to fetch a descriptor. Min: 1. Max: 128. Default: 3.

"hsdir_spread_store" -- Total number of HSDirs per replica a service will upload its descriptor to. Min: 1. Max: 128. Default: 4

"HSV3MaxDescriptorSize" -- Maximum descriptor size (in bytes). Min: 1. Max: INT32_MAX. Default: 50000

"hs_service_max_rdv_failures" -- This parameter determines the maximum number of rendezvous attempt an HS service can make per introduction. Min 1. Max 10. Default 2. First-appeared: 0.3.3.0-alpha.

"HiddenServiceEnableIntroDoSDefense" -- This parameter makes tor start using this defense if the introduction point supports it (for protover HSIntro=5). Min: 0. Max: 1. Default: 0. First appeared: 0.4.2.1-alpha.

"HiddenServiceEnableIntroDoSBurstPerSec" -- Maximum burst to be used for token bucket for the introduction point rate-limiting. Min: 0. Max: INT32_MAX. Default: 200 First appeared: 0.4.2.1-alpha.

"HiddenServiceEnableIntroDoSRatePerSec" -- Refill rate to be used for token bucket for the introduction point rate-limiting. Min: 0. Max: INT32_MAX. Default: 25 First appeared: 0.4.2.1-alpha.

Denial-of-service parameters

Denial of Service mitigation parameters. Introduced in 0.3.3.2-alpha:

"DoSCircuitCreationEnabled" -- Enable the circuit creation DoS mitigation.

"DoSCircuitCreationMinConnections" -- Minimum threshold of concurrent connections before a client address can be flagged as executing a circuit creation DoS

"DoSCircuitCreationRate" -- Allowed circuit creation rate per second per client IP address once the minimum concurrent connection threshold is reached.

"DoSCircuitCreationBurst" -- The allowed circuit creation burst per client IP address once the minimum concurrent connection threshold is reached.

"DoSCircuitCreationDefenseType" -- Defense type applied to a detected client address for the circuit creation mitigation. 1: No defense. 2: Refuse circuit creation for the length of "DoSCircuitCreationDefenseTimePeriod".

"DoSCircuitCreationDefenseTimePeriod" -- The base time period that the DoS defense is activated for.

"DoSConnectionEnabled" -- Enable the connection DoS mitigation.

"DoSConnectionMaxConcurrentCount" -- The maximum threshold of concurrent connection from a client IP address.

"DoSConnectionDefenseType" -- Defense type applied to a detected client address for the connection mitigation. Possible values are: 1: No defense. 2: Immediately close new connections.

"DoSRefuseSingleHopClientRendezvous" -- Refuse establishment of rendezvous points for single hop clients.

Padding-related parameters

"circpad_max_circ_queued_cells" -- The circuitpadding module will stop sending more padding cells if more than this many cells are in the circuit queue a given circuit. Min: 0. Max: 50000. Default 1000. First appeared: 0.4.0.3-alpha.

"circpad_global_allowed_cells" -- DOCDOC

"circpad_global_max_padding_pct" -- DOCDOC

"circpad_padding_disabled" -- DOCDOC

"circpad_padding_reduced" -- DOCDOC

"nf_conntimeout_clients" -- DOCDOC

"nf_conntimeout_relays" -- DOCDOC

"nf_ito_high_reduced" -- DOCDOC

"nf_ito_low" -- DOCDOC

"nf_ito_low_reduced" -- DOCDOC

"nf_pad_before_usage" -- DOCDOC

"nf_pad_relays" -- DOCDOC

"nf_pad_single_onion" -- DOCDOC

Guard-related parameters

(See guard-spec.txt for more information on the vocabulary used here.)

"UseGuardFraction" -- If true, clients use GuardFraction information from the consensus in order to decide how to weight guards when picking them. Min: 0. Max: 1. Default: 0. First appeared: 0.2.6

"guard-lifetime-days" -- Controls guard lifetime. If an unconfirmed guard has been sampled more than this many days ago, it should be removed from the guard sample. Min: 1. Max: 3650. Default: 120. First appeared: 0.3.0

"guard-confirmed-min-lifetime-days" -- Controls confirmed guard lifetime: if a guard was confirmed more than this many days ago, it should be removed from the guard sample. Min: 1. Max: 3650. Default: 60. First appeared: 0.3.0

"guard-internet-likely-down-interval" -- If Tor has been unable to build a circuit for this long (in seconds), assume that the internet connection is down, and treat guard failures as unproven. Min: 1. Max: INT32_MAX. Default: 600. First appeared: 0.3.0

"guard-max-sample-size" -- Largest number of guards that clients should try to collect in their sample. Min: 1. Max: INT32_MAX. Default: 60. First appeared: 0.3.0

"guard-max-sample-threshold-percent" -- Largest bandwidth-weighted fraction of guards that clients should try to collect in their sample. Min: 1. Max: 100. Default: 20. First appeared: 0.3.0

"guard-meaningful-restriction-percent" -- If the client has configured tor to exclude so many guards that the available guard bandwidth is less than this percentage of the total, treat the guard sample as "restricted", and keep it in a separate sample. Min: 1. Max: 100. Default: 20. First appeared: 0.3.0

"guard-extreme-restriction-percent" -- Warn the user if they have configured tor to exclude so many guards that the available guard bandwidth is less than this percentage of the total. Min: 1. Max: 100. Default: 1. First appeared: 0.3.0. MAX was INT32_MAX, which would have no meaningful effect. MAX lowered to 100 in 0.4.7.

"guard-min-filtered-sample-size" -- If fewer than this number of guards is available in the sample after filtering out unusable guards, the client should try to add more guards to the sample (if allowed). Min: 1. Max: INT32_MAX. Default: 20. First appeared: 0.3.0

"guard-n-primary-guards" -- The number of confirmed guards that the client should treat as "primary guards". Min: 1. Max: INT32_MAX. Default: 3. First appeared: 0.3.0

"guard-n-primary-guards-to-use", "guard-n-primary-dir-guards-to-use" -- number of primary guards and primary directory guards that the client should be willing to use in parallel. Other primary guards won't get used unless the earlier ones are down. "guard-n-primary-guards-to-use": Min 1, Max INT32_MAX: Default: 1. "guard-n-primary-dir-guards-to-use" Min 1, Max INT32_MAX: Default: 3. First appeared: 0.3.0

"guard-nonprimary-guard-connect-timeout" -- When trying to confirm nonprimary guards, if a guard doesn't answer for more than this long in seconds, treat lower-priority guards as usable. Min: 1. Max: INT32_MAX. Default: 15 First appeared: 0.3.0

"guard-nonprimary-guard-idle-timeout" -- When trying to confirm nonprimary guards, if a guard doesn't answer for more than this long in seconds, treat it as down. Min: 1. Max: INT32_MAX. Default: 600 First appeared: 0.3.0

"guard-remove-unlisted-guards-after-days" -- If a guard has been unlisted in the consensus for at least this many days, remove it from the sample. Min: 1. Max: 3650. Default: 20. First appeared: 0.3.0

Obsolete parameters

"NumDirectoryGuards", "NumEntryGuards" -- Number of guard nodes clients should use by default. If NumDirectoryGuards is 0, we default to NumEntryGuards. NumDirectoryGuards: Min: 0. Max: 10. Default: 0 NumEntryGuards: Min: 1. Max: 10. Default: 3 First-appeared: 0.2.4.23, 0.2.5.6-alpha Removed in: 0.3.0

"GuardLifetime" -- Duration for which clients should choose guard nodes, in seconds. Min: 30 days. Max: 1826 days. Default: 60 days. First-appeared: 0.2.4.12-alpha Removed in: 0.3.0.

"UseNTorHandshake" -- If true, then versions of Tor that support NTor will prefer to use it by default. Min: 0, Max: 1. Default: 1. First-appeared: 0.2.4.8-alpha Removed in: 0.2.9.

"Support022HiddenServices" -- Used to implement a mass switch-over from sending timestamps to hidden services by default to sending no timestamps at all. If this option is absent, or is set to 1, clients with the default configuration send timestamps; otherwise, they do not. Min: 0, Max: 1. Default: 1. First-appeared: 0.2.4.18-rc Removed in: 0.2.6

Glossary

The Tor Project

This document aims to specify terms, notations, and phrases related to Tor, as used in the Tor specification documents and other documentation.

This glossary is not a design document; it is only a reference.

This glossary is a work-in-progress; double-check its definitions before citing them authoritatively. ;)

Table of Contents

0. Preliminaries 1.0. Commonly used Tor configuration terms 2.0. Tor network components 2.1. Relays, aka OR (onion router) 2.1.1. Specific roles 2.2. Client, aka OP (onion proxy) 2.3. Authorities 2.4. Hidden Service 2.5. Circuit 2.6. Edge connection 2.7. Consensus 2.8. Descriptor 3.0. Tor network protocols 3.1. Link handshake 3.2. Circuit handshake 3.3. Hidden Service Protocol 3.4. Directory Protocol 4.0. General network definitions

Preliminaries

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Commonly used Tor configuration terms

ORPort - Onion Router Port DirPort - Directory Port

Tor network components

Relays, aka OR (onion router)

[Style guide: prefer the term "Relay"]

Specific roles

Exit relay: The final hop in an exit circuit before traffic leaves the Tor network to connect to external servers.

Non-exit relay: Relays that send and receive traffic only to other Tor relays and Tor clients.

Entry relay: The first hop in a Tor circuit. Can be either a guard relay or a bridge, depending on the client's configuration.

Guard relay: A relay that a client uses as its entry for a longer period of time. Guard relays are rotated more slowly to prevent attacks that can come from being exposed to too many guards.

Bridge: A relay intentionally not listed in the public Tor consensus, with the purpose of circumventing entities (such as governments or ISPs) seeking to block clients from using Tor. Currently, bridges are used only as entry relays.

Directory cache: A relay that downloads cached directory information from the directory authorities and serves it to clients on demand. Any relay will act as a directory cache, if its bandwidth is high enough.

Rendezvous point: A relay connecting a client to a hidden service. Each party builds a three-hop circuit, meeting at the rendezvous point.

Client, aka OP (onion proxy)

[Style: the "OP" and "onion proxy" terms are deprecated.]

Authorities:

Directory Authority: Nine total in the Tor network, operated by trusted individuals. Directory authorities define and serve the consensus document, defining the "state of the network." This document contains a "router status" section for every relay currently in the network. Directory authorities also serve router descriptors, extra info documents, microdescriptors, and the microdescriptor consensus.

Bridge Authority: One total. Similar in responsibility to directory authorities, but for bridges.

Fallback directory mirror: One of a list of directory caches distributed with the Tor software. (When a client first connects to the network, and has no directory information, it asks a fallback directory. From then on, the client can ask any directory cache that's listed in the directory information it has.)

Hidden Service:

A hidden service is a server that will only accept incoming connections via the hidden service protocol. Connection initiators will not be able to learn the IP address of the hidden service, allowing the hidden service to receive incoming connections, serve content, etc, while preserving its location anonymity.

Circuit:

An established path through the network, where cryptographic keys are negotiated using the ntor protocol or TAP (Tor Authentication Protocol (deprecated)) with each hop. Circuits can differ in length depending on their purpose. See also Leaky Pipe Topology.

Origin Circuit -

Exit Circuit: A circuit which connects clients to destinations outside the Tor network. For example, if a client wanted to visit duckduckgo.com, this connection would require an exit circuit.

Internal Circuit: A circuit whose traffic never leaves the Tor network. For example, a client could connect to a hidden service via an internal circuit.

Edge connection:

2.7. Consensus: The state of the Tor network, published every hour, decided by a vote from the network's directory authorities. Clients fetch the consensus from directory authorities, fallback directories, or directory caches. 2.8. Descriptor: Each descriptor represents information about one relay in the Tor network. The descriptor includes the relay's IP address, public keys, and other data. Relays send descriptors to directory authorities, who vote and publish a summary of them in the network consensus.

Tor network protocols

Link handshake

The link handshake establishes the TLS connection over which two Tor participants will send Tor cells. This handshake also authenticates the participants to each other, possibly using Tor cells.

Circuit handshake

Circuit handshakes establish the hop-by-hop onion encryption that clients use to tunnel their application traffic. The client does a pairwise key establishment handshake with each individual relay in the circuit. For every hop except the first, these handshakes tunnel through existing hops in the circuit. Each cell type in this protocol also has a newer version (with a "2" suffix), e.g., CREATE2.

CREATE cell: First part of a handshake, sent by the initiator.

CREATED cell: Second part of a handshake, sent by the responder.

EXTEND cell: (also known as a RELAY_EXTEND cell) First part of a handshake, tunneled through an existing circuit. The last relay in the circuit so far will decrypt this cell and send the payload in a CREATED cell to the chosen next hop relay.

EXTENDED cell: (also known as a RELAY_EXTENDED cell) Second part of a handshake, tunneled through an existing circuit. The last relay in the circuit so far receives the CREATED cell from the new last hop relay and encrypts the payload in an EXTENDED cell to tunnel back to the client.

Onion skin: A CREATE/CREATE2 or EXTEND/EXTEND2 payload that contains the first part of the TAP or ntor key establishment handshake.

Hidden Service Protocol

Directory Protocol

General network definitions

Leaky Pipe Topology: The ability for the origin of a circuit to address relay cells to be addressed to any hop in the path of a circuit. In Tor, the destination hop is determined by using the 'recognized' field of relay cells.

Stream: A single application-level connection or request, multiplexed over a Tor circuit. A 'Stream' can currently carry the contents of a TCP connection, a DNS request, or a Tor directory request.

Channel: A pairwise connection between two Tor relays, or between a client and a relay. Circuits are multiplexed over Channels. All channels are currently implemented as TLS connections.

Tor specifications (draft mdbook port)