Over the past two weeks our main focus has been updating all clients to PoC5 compatibility, and it’s definitely been a long road. Changes to the virtual machine include:
- The new init/code mechanism: Basically, when you create a contract, the provided code will be executed immediately, and then the return value of that code will be what becomes the contract code. This allows us to have a contract initialization code, but still keep the same format of [nonce, price, gas, to, value, data] for both transactions and contract creation, which also makes it easy to create new contracts via forwarding contracts
- Reordering of transaction and contract data: the order is now [nonce, price, gas, to, value, data] in transactions and [gas, to, value, datain, datainsz, dataout, dataoutsz] in messages. Note that Serpent preserves the parameters send(to, value, gas), o = msg(to, value, gas, datain, datainsz) and i = msg(to, value, gas, datain, datainsz, dataoutsz).
- Rate adjustments– Transaction creation now has a fee of 500 gas, and various other fees have been updated.
- The CODECOPY and CALLDATACOPY operation codes: CODECOPY takes code_index, mem_index, len as arguments and copies the code from code_index … code_index+len-1 to memory mem_index … mem_index+len-1. These They These codes are particularly useful when combined with init/code Now CODESIZE can be used as well.
HoweverThe The protocol’s architecture has seen the most important changes. On the GUI side, the C++ and Go clients are evolving rapidly, and we’ll see more updates on that side very soon. If you’ve been following Ethereum closely, chances are you’ve seen denny’s lottery, a complete implementation of a lottery, plus GUI, written and executed within the C++ client. From here, the C++ client will become more of a developer-oriented tool, while the Go client will begin to focus on being a user-facing application (or rather, a meta-application). On the compiler side, Serpent has undergone a number of substantial improvements.
FirstThe code. You can take a look at the Serpent compiler under the hood and you can see all functionsavailable, along with their precise translations into EVM code. For example, we have:
72: [‘access’, 2, 1,
73: [”, ”, 32, ‘MUL’, ‘ADD’, ‘MLOAD’]],
This means that what access(x,y) is doing under the hood is recursively compiling whatever x and y are, and then loading memory at index x + y * 32; therefore x is the pointer to the beginning of the array and y is the index. This code structure has been around since PoC4, but I’ve now updated the metalanguage used to describe translations even further, to include even if, while, and init/code in this construct (they were special cases before); now only set and seq remain as special cases, and if you wanted you could even remove seq by re-implementing it as a rewrite rule.
The The PoC5 compatibility has seen the most significant changes. For example, if you run serpent compile_to_assembly ‘return(msg.data[0]*2)’, you’ll see:
[“begincode_0″, “CALLDATACOPY”, “RETURN”, “~begincode_0”, “#CODE_BEGIN”, 2, 0, “CALLDATALOAD”, “MUL”, “MSIZE”, “SWAP”, “MSIZE”, “MSTORE”, 32, “SWAP”, “RETURN”, “#CODE_END”, “~endcode_0”]
The actual code is nothing more than:
[2, 0, “CALLDATALOAD”, “MUL”, “MSIZE”, “SWAP”, “MSIZE”, “MSTORE”, 32, “SWAP”, “RETURN”]
If you want to see what’s happening here, suppose you’re entering a message whose first data is 5. So we have:
2 -> Stack: [2]
0 -> Stack: [2, 0]
CALL DATA LOAD -> Stack: [2,5]
MUL -> Stack: [10]
SIZE -> Stack: [10, 0]
SWAP -> Stack: [0, 10]
SIZE -> Stack: [0, 10, 0]
MSTORE -> Stack: [0]Memory: [0, 0, 0 … 10]
32 -> Stack: [0, 32]Memory: [0, 0, 0 … 10]
SWAP -> Stack: [32, 0]Memory: [0, 0, 0 … 10]
RETURN
The last RETURN returns all 32 bytes of memory starting at 0, or [0, 0, 0 … 10]or the number 10.
Now, let’s look at the wrapper code.
[“begincode_0″, “CALLDATACOPY”, “RETURN”, “~begincode_0”, “#CODE_BEGIN”, ….. , “#CODE_END”, “~endcode_0”]
To make it more clear, I’ve removed the internal code. The first thing we see are two labels, start_code_0 andendcode_0 and the #CODE_BEGIN and #CODE_END guards. The The The tags are used to indicate the beginning or end of an inner code. The Shields indicate the end of the compiler. NowLet’s begin with the first section of the code. In This The begincode_0 position is 10, while the endcode_0 position 24 respectively. endcode_0 are used to refer to these positions, and $begincode_0.endcode_0 refers to the length of the interval between them, 14. Now, remember that during contract initialization, the data in the call is the code you are passing. Therefore, we have:
14 -> Stack: [14]
DUP -> Stack: [14, 14]
SIZE -> Stack: [14, 14, 0]
SWAP -> Stack: [14, 0, 14]
SIZE -> Stack: [14, 0, 14, 0]
10 -> Stack: [14, 0, 14, 0, 10]
CALL COPY DATA -> Stack: [14, 0] Memory: [ … ]
RETURN
Notice how the first half of the code cleverly configured the stack to push the inner code to memory indices 0…13, and then immediately return that part of memory. In the final compiled code, 600e515b525b600a37f26002600035025b525b54602052f2, the inner code sits nicely to the right of the initializer code that simply returns it. In Initializers can create and modify complex contracts.
Now let us introduce you to Serpent’s newest and funniest feature: imports. One A common use in contract country is when you want to let a contract create new contracts. The The The problem is how can you insert the code generated by the contract into the generator contracts? BeforeThe only way to solve this problem was to compile the new contracts first and then put the compiled code into an array. NowTheThere is an easier solution: import.
Put the following in returnten.se:
x = create(tx.gas – 100, 0, import(mul2.se)) return(message(x,0,tx.gas-100,[5],one))
Now, put the following in mul2.se:
return(message.data[0]*2)
Now if snake compiles returnten.se and execute the contract, you realize that, voila, return ten. The It It’s easy to understand why. The returnten.se contract instantiates the mul2.se contract and then calls it with the value 5. mul2.se, as the name suggests, is a doubler, so it returns 5*2 = 10. Note that import is not a function in the standard sense; x = import(‘123.se’) will fail, and the import only works in the very specific context of create.
Now Imagine Imagine that you want to create a 1000-line contract. To Inset You can use it to do this. Intoouter.se, put:
if msg.data[0] == 1: box(internal.se)
And in inner.se, put:
return(3)
Running serpent compile outside.se gives you a nice piece of compiled code that returns 3 if the msg.data[0] argument is equal to one. And That’s it.
Upcoming Serpent updates include:
- An improvement to this mechanism so that it does not load the internal code twice if you try to import twice with the same filename
- string literals
- Space and code efficiency improvements for array literals
- A debugging decorator (i.e. a build function that tells you which lines of Serpent correspond to which bytes of compiled code)
However, in the short term, my own effort will focus on bug fixes, a suite of cross-client tests, and continued work on ethereumjs-lib.