Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 101 additions & 1 deletion data-structures.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,107 @@ Bitfields are a set of bits encoded using a custom run length encoding: rle+. r

#### `SectorSet`

TODO
The `SectorSet` is an integer set implemented with a simple array mapped tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call the data structure AMT and say that a

type SectorSet {UInt:SectorMeta}<AMT>

Integer indexes range from 0 to infinity (TODO practical bounds / bounds implied by encoding?).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 64 bit int limitation OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth setting some maximum based on the maximum number of sectors we expect to impose on miners

The two basic structures in the `SectorSet` are the `Node` and
the `Pointer`. There is also a `RootNode`. The `Node` is as follows:
```go
type Node struct {
Pointers [S]*Pointer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new node always allocates S empty Pointers

}
```
where `S` is the fixed node width. `S` is a constant set to `256`. The `Node`
is serialized as a cbor array (major type 4) of Pointers.


[ipld dag-cbor Cids](https://github.com/ipld/specs/blob/master/Codecs/DAG-CBOR.md#link-format).

The `Pointer` is as follows:
```go
type Pointer struct {
Value SectorMeta
Link Cid

}
```
The `Pointer` is serialized as a cbor object (major type 5). The `Value`
is serialized { in some way } with `v` as its object key. The `SectorMeta`
type carries all needed sector metadata (TODO, define a type for commD commR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (commR, commD)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @laser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SectorMeta will need:

  • CommR so that we can generate PoSts (the generate_post routines takes as input all CommRs in the miner's proving set).
  • CommD for piece inclusion proof verification.

pair and serialization). A `Pointer` is conceptually a union type, it can
carry a `Link` or a `Value` but never both in a well formed `SectorSet`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the object key used for the Link?


The `RootNode` is as follows:
```go
type RootNode struct {
MaxDepth uint
NodePointer Cid
}
```
The `RootNode` is serialized as a cbor object (major type 5) where `NodePointer`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont' think the rootnode needs to be serialized ever.

is serialized as a Cid and MaxDepth is a cbor unsigned int (major type 0). The
`NodePointer` field uses `p` as its object key, and the `MaxDepth` field uses
`d`.

##### Lookup

Lookup takes in an integer sectorID and returns a LeafNode value if this index
is stored in the SectorSet. Each node has a `height`, a node's child has a
`height` one less than its own height and the first node has a `height` of the
root node's max depth. Leaf nodes have a height of 1.

At each node the next child is chosen by examining the index and determining
which ordered subtree the index fits into. This can be calculated by taking
the quotient `index / (S)^(h - 1)`. The index for the recursive search on the
child node is set to the remainder `index % (S)^(h-1)`

Pseudocode:
```go
func (rn *RootNode) Lookup(index int) (v Anything, error) {
return recLookup(rn.MaxDepth, index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesnt this mention the expansion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is just for lookup, i see

}

func (n *Node) recLookup(curHeight, index int) {
childRange := math.Pow(S, curHeight - 1)
pointerIndex := index / childRange

p := n.Pointers[pointerIndex]
if curHeight == 1 {
return p.Value
}
newNode := ipldResolve(p.Link)

return newNode.recLookup(curHeight - 1, index % childRange)
}
```

##### Expand

As the `SectorSet` grows it becomes necessary to expand the tree to insert
values with higher indexes. When given an index `b` that exceeds the tree's
capacity, the `SectorSet` adds enough parent nodes to the node pointed to by
the root that the `SectorSet` has capacity for its existing indices and `b`.
Pointers are then updated in these new nodes so that there is a path from
the new node with the highest height to the existing node pointed to by root.
Finally the root node updates to point to the node with highest height.
Copy link
Contributor Author

@ZenGround0 ZenGround0 Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as the SectorSet grows the first insert into the new region of the tree will start to require more and more overhead. Making this scale better probably requires a redesign of the datastructure (we could do something closer to the HAMT with more complexity to improve this). Is the current design acceptable or should we try to address this now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logarithmic scaling is about optimal here.


##### Insert Value

First Expand the tree as needed given the input value.

Now run the Lookup traversal. If the traversal leads to a node at the max depth
(height of 1), then set the `Value` field at `index % childRange` to the insert value.

If the traversal needs to resolve a pointer link but that link does not exist,
then create the remaining necessary nodes, update them to point to a path
of nodes until reaching the leaf node and set the node's pointer Value at
`index % childRange` to the insert value.

##### Delete Value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of bitfields and constant sized arrays might make this prohibitively expensive as we have search the whole array for emptyness (we could include a simple count to make this better). Also note that as specced out here users can arrive at two identical sets with different representations. Id appeciate comments on the importance of addressing these issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating over some small constant number of items shouldnt be prohibitively expensive. Since we have to parse bytes into a datastructure anyways, we can keep track of the item count in memory (but not for the serialized version)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not covering range deletes here. Let me know if that's a requirement for v1.

Run the Lookup traversal. If the value is found delete its value from the
`Pointers` array. If the `Pointers` array is empty after this deletion then
update the parent pointer to have a nil link. Continue checking if parents
are empty of links and removing until reaching a parent that is not empty.

#### `FaultSet`

Expand Down