- Tokenization
- Whether we need it at the protocol level for masking
- If we do, what should we use for text, images, video, audio?
- Masking Ratios for Modalities
- What are the optimal masking ratios for different modalities?
- Sharding
- How many nodes in a shard?
- Market of Experts
- Will people actually use this if it's not required?
- Tokenomics
- What control mechanism will we use for controlling the burn fee?
- Rewards
- Specific rewards amounts for embedders, validators, and data submissions
- Slashing
- Validator slashing mechanics
- Encoder slashing formula
- Multimodality
- What is the embedding space's dimensionality?
- Does our mechanism for multimodality work?
- Consensus
- Do we need leaders for Mysticeti?
- What state needs to be synced and how do we implement it?
- How normal people will use the network
- Obtaining the embeddings with requested dimensions (Matryoshka)
- How will nodes validate the Proof of Curiosity game?
- Vector storage for nearest neighbor search
- k in Proof of Curiosity
- What is the optimal value for k?
- how does negotiating the price of embeddings work?
- who calls the encoders from the validator nodes
- how do batches of data work
- data formats / resizing data
- How can we fast sync efficiently while checkpointing state rather and using SPVs and merkle trees?
- Practically how do we store balances
- how do we update state