<br><br><div class="gmail_quote">On Mon, Jun 29, 2009 at 3:09 PM, Paolo Bonzini <span dir="ltr"><<a href="mailto:paolo.bonzini@gmail.com">paolo.bonzini@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On 06/29/2009 11:08 PM, Eliot Miranda wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On reading this my first question us "what should at: do?". Have you<br>
thought this through? Does at: have to search for TAG marks and skip<br>
over them, or is the problem punted up to the client?<br>
</blockquote>
<br></div>
Tags are zero-width Unicode characters just like the byte-order mark U+FEFF. Note that the tag uses a completely different set of characters than the normal Latin alphabet. Similar to how in UTF-8/UTF-16 it is possible to find in O(1) time the beginning of a character, in this RFC it is always clear if a character is part of a tag or not.</blockquote>
<div><br></div><div>But being able to find the start of a character in O(1) doesn't tell you how many characters there are between a given address within the string and its start address, and it doesn't tell you what the address of a character at a given index in the string is. So if the TAG representation is the internal representation (which I think is implied by using this as a means of carrying language information around with the character data) then this representation implies O(N) at:, which means that it'll only be suitable as an exchange representation (and expensive to encode/decode to/from) or it needs an additional index structure, or...?</div>
<div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br><font color="#888888">
<br>
Paolo<br>
</font></blockquote></div><br>