code logs -> 2008 -> Sun, 09 Mar 2008< code.20080308.log - code.20080310.log >
--- Log opened Sun Mar 09 00:00:04 2008
00:23 GeekSoldier|bed [~Rob@91.18.86.ns-26604] has quit [Ping Timeout]
00:24 GeekSoldier|bed [~Rob@91.18.86.ns-26604] has joined #code
00:29 GeekSoldier|bed [~Rob@91.18.86.ns-26604] has quit [Ping Timeout]
00:29 AnnoDomini [AnnoDomini@83.21.32.ns-4025] has quit [Quit: (...) By this point, the astute reader has picked up that Nethack isn't a "game" as much as an extremely prolonged and extremely elaborate form of masochism. Ask any serious player.]
01:30 Vornicus [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout]
01:31 Vornotron [~vorn@Admin.Nightstar.Net] has joined #code
01:31 You're now known as TheWatcher
04:50 You're now known as TheWatcher[zZzZ]
04:53 Vornotron is now known as Vornicus
04:54 Vornicus is now known as NSGuest-5480
04:55 NSGuest-5480 is now known as Vornicus
05:00 * Reiver finally clicks as to what the hell a tuple really is.
05:00 * Reiver can't believe he'd struggled with the concept, given it is distinctly 'Durrr' stuff. >.<
05:12
< Vornicus>
Heh
05:16 Reiver is now known as ReivShoppin
05:16
<@ReivShoppin>
Seriously!
05:17
<@ReivShoppin>
"A row of data"
05:17
<@ReivShoppin>
It'd had me puzzled in Python for ages >.>
05:27
< Vornicus>
*snrk*
05:34 Thaqui [~Thaqui@Nightstar-123.jetstream.xtra.co.nz] has joined #code
05:34 mode/#code [+o Thaqui] by ChanServ
06:38 ReivShoppin is now known as Reiver
07:00 GeekSoldier|bed [~Rob@Nightstar-8762.dip.t-dialin.net] has joined #code
07:03 Vornicus [~vorn@ServicesOp.Nightstar.Net] has quit [Ping Timeout]
07:04 GeekSoldier|bed is now known as GeekSoldier
07:07 Vornicus [~vorn@Admin.Nightstar.Net] has joined #code
07:07 mode/#code [+o Vornicus] by ChanServ
07:34 AnnoDomini [AnnoDomini@83.21.32.ns-4025] has joined #Code
07:34 mode/#code [+o AnnoDomini] by ChanServ
07:36 Vornicus is now known as Vornicus-Latens
08:05
<@jerith>
Reiver: I always thought of it as "a read-only list".
08:05
<@Reiver>
jerith: Yeah, well, I'd been trying to get my head around the concept.
08:05
<@Reiver>
Now I do, problem solved~
08:07
<@jerith>
:-)
09:00 GeekSoldier [~Rob@Nightstar-8762.dip.t-dialin.net] has quit [Ping Timeout]
09:09 Thaqui [~Thaqui@Nightstar-123.jetstream.xtra.co.nz] has left #code [Leaving]
09:41 GeekSoldier [~Rob@Nightstar-9089.dip.t-dialin.net] has joined #code
09:43 gnolam [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has joined #Code
09:43 mode/#code [+o gnolam] by ChanServ
09:54 Brother_Willibald [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has joined #Code
09:55 Brother_Willibald [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has quit [Quit: *poof*]
10:20 You're now known as TheWatcher
11:30 AnnoDomini [AnnoDomini@83.21.32.ns-4025] has quit [Ping Timeout]
11:31 AnnoDomini [AnnoDomini@83.21.28.ns-26444] has joined #Code
11:31 mode/#code [+o AnnoDomini] by ChanServ
12:29 eXeLaNCe [~dddd@88.245.15.ns-13237] has joined #code
13:03 eXeLaNCe [~dddd@88.245.15.ns-13237] has quit [Quit: ]
13:08
< Moltare>
Idly, lads, I'm looking for the regex that means "Anything, including spaces, tabs and newlines, that comes between /* and */"
13:08
< Moltare>
I thought it was "/*"[. \t\n]*"*/" , but that doesn't seem to cut the mustard
13:08
<@McMartin>
It doesn't because it's allowing */s in the middle of it.
13:09
<@McMartin>
That said
13:09
<@McMartin>
I seem to recall that Tiger allows nested comments, so you're going to have to be more cunning about this
13:09
< Moltare>
Tiger doesn't
13:09
< Moltare>
Or, wate
13:09
< Moltare>
Tiger does, but I'm not trying to build a lexer for Tiger
13:10
< Moltare>
It's for a dodgy homebrew that our lecturer made up
13:13
<@McMartin>
Ah, OK
13:13
<@McMartin>
The problem you've hit is that [. \t\n]* means "as many of any character you can munch"
13:13
<@McMartin>
That's the entire file
13:13
<@McMartin>
You need to exclude the "*/" sequence from that middle bit.
13:14
< Moltare>
Well, my /current/ problem is that it doesn't find a "/*" at all; it finds a division operator followed by a multiplication operator
13:14
<@McMartin>
Aha.
13:14
< Moltare>
Despite the quotes
13:14
<@McMartin>
You need to put the comment-matcher higher up in the flex file so it will have higher priority.
13:15
< Moltare>
It was already above the others
13:15 * Moltare puts it at the top instead
13:15
<@McMartin>
Hrm.
13:15
<@McMartin>
Maybe it then needs to be at the bottom~
13:22 * Moltare fiddle
13:24
< Moltare>
So, using "/*"[a-zA-Z0-9 \t\n]*"*/" to limit it to alphanumeric characters for the moment
13:25
< Moltare>
It recognises /* */ as a comment, but not /* comment */
13:26 * GeekSoldier tries to remember... "/blah/s"?
13:28
<@McMartin>
GeekSoldier: This is flex, not Perl
13:28
<@McMartin>
Moltare: OK, that boggles me
13:29
< GeekSoldier>
oh.
13:29 * GeekSoldier returns to his corner.
13:30
<@McMartin>
Maybe it needs spaces between the "s and the []s?
13:32
< Moltare>
no appreciable difference
13:34
<@McMartin>
Does the space need to be escaped?
13:34
< Moltare>
It doesn't for the "ignore spaces, tabs and newlines" entry
13:34
<@McMartin>
Blarghlecopter.
13:35
<@McMartin>
What does it think of /*c*/?
13:36
<@McMartin>
I'm wondering if it's somehow getting boggled by more than one character or something
13:37
< Moltare>
Breaks in the exact same way
13:37
< Moltare>
/**/ and /* */ are fine, /*c*/ and /* c */ are not
13:37
<@McMartin>
How about /* */?
13:38
< Moltare>
Fine
13:39
<@McMartin>
/* 123 */?
13:39
< Moltare>
Not
13:40
<@McMartin>
This is several varieties of aggravating
13:40
<@McMartin>
The TeXInfo implies that it should work
13:40
<@McMartin>
What happens if you remove the 0-9 and try your test cases?
13:41
< Moltare>
Same
13:41
<@McMartin>
In case this is some bizarre heinousness where - works for letters but not numbers
13:41
<@McMartin>
Rargh.
13:41
<@McMartin>
OK
13:41 * Moltare ponders, deletes everything but the .l file, recompiles from scratch just in case
13:43
< Moltare>
Ah, now it won't compile. This indicates progress of a backwards sort
13:43
<@McMartin>
Gnrk.
13:43
<@McMartin>
What's the error?
13:44
< Moltare>
FXD
13:44
< Moltare>
Spaces where they shouldn't be
13:45
< Moltare>
/**/: works /* */: works /*c*/: works /* c */: works
13:45
<@McMartin>
... OK.
13:45
<@McMartin>
And now /* 123 */ won't.
13:45
< Moltare>
And the compiler was reusing an old version of something
13:46
< Moltare>
I put that back in, McM
13:46
<@McMartin>
Aha.
13:46
<@McMartin>
Good times then
13:47
< Moltare>
Now I need to change it from "alphanumeric characters" to "anything that isn't */", I take it
13:48
<@McMartin>
Right.
13:48
<@ToxicFrog>
Yes.
13:48
<@McMartin>
But, of course, /********/ needs to be legal.
13:48
<@McMartin>
So you can't just do [^*]*
13:50
< Moltare>
What about [^"*/"]*? Or will that just compare every character to */ and therefore never fire?
13:51
<@ToxicFrog>
That is "everything but ", *, and /", except that since flex uses " as well it probably won't work period
13:51
< Moltare>
ah
13:52
<@ToxicFrog>
I'm not entirely sure it's possible to handle /* comments */ using just regexes
13:52
<@McMartin>
It is.
13:52
<@McMartin>
You have to be a rat bastard about it, but it's doable.
13:52
<@McMartin>
What *isn't* doable with raw regex is nested comments.
13:53
<@McMartin>
Flex can do it by abusing state variables to make it limited-context-free, but.
13:53
< Moltare>
My rat bastardry skills are weak, as you may have noted ¬¬
13:54
<@McMartin>
Basically, * is allowed *as long as it isn't followed by a slash*.
13:55
<@McMartin>
And you know how to say "something that isn't a slash"
13:57
< Moltare>
So it's ((^*|^/)|(*^/))* ?
13:57
< Moltare>
Not an asterisk or slash, or asterisk as long as a slash doesn't follow
13:58
<@McMartin>
That doesn't look remotely like flex syntax
13:58
< Moltare>
I used the wrong shape brackets there ¬¬
13:59
<@McMartin>
^ outside of the square brackets means "match the beginning of a line"
13:59
<@McMartin>
Also, /*////////*/ is an acceptable comment.
14:01
< Moltare>
[[^*|^/]|[*^/]|[^*/]]* ? That just complains at me ¬¬
14:04
<@McMartin>
Yeah, you can't nest []s.
14:05
< Moltare>
How irritating.
14:05
<@McMartin>
[] is a special case in its own right.
14:05
<@McMartin>
[^*|^*/] is not an OR.
14:05
<@McMartin>
If you want "neither * nor /" that's [^*/].
14:05
< Moltare>
Is that not "not *, followed by /"?
14:06
<@McMartin>
No.
14:06
<@McMartin>
Because [^*/] matches a single character.
14:06
<@McMartin>
Specifically, any character that is not * or /.
14:06
<@McMartin>
It's equivalent to [^/*].
14:06
< Moltare>
Alright, then
14:08
< Moltare>
So how do I create ( /* followed by (either not * and not /, or * that isn't followed by /, or / that isn't preceded by *) an arbitrary number of times followed by */ ), then?
14:08
< Moltare>
(The lex manual I have here claims that | is an OR, idly)
14:08
<@McMartin>
Well, "/*" followed by (something) is easy.
14:08
<@McMartin>
Yes, | is indeed or.
14:09
<@McMartin>
However, when part of a [] token, | is "the vertical bar character"
14:09
<@McMartin>
Also, your last bit of the spec is wonky.
14:09
<@McMartin>
/* /* */ is a valid comment.
14:09
<@McMartin>
/* /* */ */ is a valid comment followed by a * and a /.
14:10
<@McMartin>
All that said
14:11
< Moltare>
/* /* */ fits, surely? It's /*, followed by / that is preceded by a space, followed by * that is followed by a space, followed by */
14:11
<@McMartin>
You can use () to group stuff up
14:11
<@McMartin>
Oh, I see, I missed the "preceded"
14:11
<@McMartin>
Try a rephrase.
14:12
<@McMartin>
"Either a single character that isn't *, or a * followed by something that isn't /..."
14:13
< Moltare>
"/*"(^*|*^/)*"*/" is what I'd got it down to
14:13
<@McMartin>
Close.
14:13
<@McMartin>
You're missing some []s in strategic locations.
14:13
<@McMartin>
And possibly some ""s.
14:14
< Moltare>
¬¬ Much as I appreciate the help, I begin to see why asking for it drives Jaci insane. I'm not looking to learn, here, I just want the damn thing working so I can put it in my past :P
14:14 * Moltare fiddle some more, then
14:15
<@McMartin>
Moltare: And I've TAed this very class twice, and so I am deliberately nerfing myself, acting as if you were somebody wandering into my office hours.
14:15
<@McMartin>
I rather suspect this isn't going to help the attitude problems much.
14:15
< Moltare>
heh
14:16
< Moltare>
Victory, all the same
14:16
<@McMartin>
Good show
14:17
<@McMartin>
I'm afraid I can't be a lot of help with a C-based recursive descent parser, though I can point you at ones that I wrote in Java and OCaml. The principles should be similar. =P
14:19 * Moltare applies hard-won knowledge, fixes his string definition into the bargain
14:21
< Moltare>
Ahh.. or not. Because a string "foo" currently reports as a string with value "foo" rather than a string with value foo...
14:21 * Moltare attempts to solo this one, first
14:23 * McMartin goes to deal with breakfast
14:35
< Moltare>
Doesn't help that every time I go to write 'lexer' I write 'lever'
14:35
< Moltare>
Did it that time too
14:44
< Moltare>
Lunch!
15:06 * gnolam snerks.
15:06
<@gnolam>
http://www.imdb.com/name/nm2469945/
15:09
<@McMartin>
Hmm, and because I seem to have neglected to quote it in here:
15:09
<@McMartin>
"The defense grid can be full of lasery doom. The defense grid is not full of lasery doom."
15:12 * Moltare replaces his printfs with something more useful to the nascent parser
15:13
< Moltare>
Understanding check, plz?
15:13
< Moltare>
A rule should return a T_SOMEKINDOFTOKEN and possibly an associated yylval
15:14
<@McMartin>
Urgh. I haven't used flex proper in long enough to be able to answer that with confidence.
15:14
< Moltare>
There is also a .h file, whatever one of those is, that lists structures of T_ALLTHETOKENS
15:14
<@McMartin>
That sounds about right.
15:14
< Moltare>
ie type, value
15:14
<@McMartin>
Yeah
15:15
< Moltare>
Then the parser itself calls the yyparse() thing generated in yy.lex.c by flex, breaks it into left,right&centre and recurses it in the face
15:16
< Moltare>
erm, yylex() thing
15:16
<@McMartin>
I don't recall if the actual token return ends up in a global too or not
15:16
< Moltare>
and a hash table is involved to check if variables are present or not
15:17
<@McMartin>
Well, that's your doing, not flex's.
15:17
< Moltare>
The hash table? yes
15:17
< Moltare>
I've got a fragment of code here: struct token { char *lexeme; int type; int value; }, but no idea what I'm supposed to be doing with it ¬¬
15:19
<@McMartin>
At this point you're in "what your assignment is" territory and we're unlikely to be a lot of help.
15:19
<@McMartin>
(By which I mean "involving the spec of the assignment", not "we aren't going to do your homework for you")
15:21
< Moltare>
As far as I've worked it out: parser.c has the parse() method which does the actual parsing, and #includes a tokens.h file and a lex.yy.c file.
15:21
< Moltare>
lex.yy.c is what flex creates.
15:22
<@McMartin>
Right.
15:22
< Moltare>
tokens.h defines the token structure globally as having a type and possibly a value, and lists the type for a given token
15:22
<@McMartin>
And presumably, right now your parse() is just reading the stream and dumping it?
15:22
< Moltare>
(as a big column of #define T_COMMA 125; etc
15:22
<@McMartin>
Right
15:23
< Moltare>
Right now I have no parse(), as I've only just got the lexer putting out stuff on command
15:23
<@McMartin>
OK.
15:23
< Moltare>
That, I think, is step 1
15:23
<@McMartin>
Aye.
15:23
< Moltare>
Get the tokens.h file and make it play nicely with a basic parse() method
15:23
<@McMartin>
So, parse() is going to be taking the output of the lexer as a stream of tokens, and turning it into some kind of (probably tree-recursive) structure.
15:24
< Moltare>
recursive-descent, as specified
15:24
< Moltare>
(in my spec, that is, not 'as I have already mentioned')
15:24
<@McMartin>
Well, that's the parser's implementation
15:24
<@McMartin>
By "tree-recursive" I mean that you're producing a list of Expressions or whatnot
15:24
< Moltare>
Oh.
15:24
< Moltare>
Yes.
15:24
<@McMartin>
And Expressions themselves can be made of expressions.
15:25
<@McMartin>
Have you done anything with unions in C?
15:25
< Moltare>
No, and I note that your use of "with unions" is superfluous.
15:25
< Moltare>
I have never touched C before this assignment. ¬¬
15:26
<@McMartin>
OK, so.
15:26
<@McMartin>
A union is sort of like a struct, except that all of the members overlap.
15:26
<@McMartin>
This lets you do horrifically awful things to memory, much like everythign else in C.
15:26
<@McMartin>
More to the point, it's a way to get Polymorphism.
15:26
<@McMartin>
You go, say:
15:26
<@McMartin>
struct TOKEN {
15:26
<@McMartin>
int tag;
15:26
<@McMartin>
union {
15:26
<@McMartin>
char * stringval;
15:26
<@McMartin>
int intval
15:26
<@McMartin>
} value;
15:27
<@McMartin>
};
15:27
< Moltare>
OH, right
15:27
< Moltare>
So it can be either
15:27
< Moltare>
(what does the * represent?)
15:27
<@McMartin>
And then if you access the wrong value of the union you corrupt memory and possibly bring down the entire machine
15:27
<@McMartin>
"address of previous type"
15:27
<@McMartin>
C has no concept of strings.
15:28
<@McMartin>
Instead, you use an address of a character, and hope and pray that there is a null byte at an appropriate point in the future.
15:28
<@gnolam>
Eh, the real usefulness of unions lies in serialization.
15:28
<@gnolam>
IMO.
15:28
< Moltare>
The more I hear of C, the more I wonder why everyone hates Java so much ¬¬ it seems to be far more intent on exsanguinating you
15:28
<@McMartin>
As an ML partisan, I beg to differ. They're for implementing Constructor types.
15:29
<@McMartin>
Moltare: C partisans feel that Java's inability to completely fuck you over for the tiniest mistake is an unconscionable assault on their freedom as a programmer.
15:29
<@ToxicFrog>
Moltare: C, unlike Java, is useful for implementing kernels, device drivers, and other low-level-but-we-don't-want-to-write-this-in-asm stuff.
15:29
<@McMartin>
And yes, said freedom is actually necessary for direct hardware control.
15:30
<@ToxicFrog>
The same features that make it useful for that also make it insanely dangerous, though~
15:30
<@McMartin>
That said, when your professor said that this would be vastly easier in C, he was lying through his teeth. I suspect his actual intent was to make you actually implement stuff on your own instead of just handing it over to library classes.
15:30
<@McMartin>
Like, you know, String.
15:30
<@McMartin>
And HashMap.
15:31
<@ToxicFrog>
Quite.
15:32
< Moltare>
So, having created our token structure and given the appropriate #define T_COMMA someintegervalue in the token.h file, I then get the lexer to return T_COMMA when it hits "," in the program you hand it
15:32
< Moltare>
"," { return T_COMMA; } sort of thing
15:33
<@McMartin>
(Also, less hostilely, because the Java version of the Tiger book uses a totally different technique than the C/ML version, revolving around Visitors)
15:33
<@McMartin>
Mol: That sounds about right, yes.
15:33
<@McMartin>
IIRC, calls to yylex() will assign some global structure that will let you get the juicy datameats out once this is done
15:33
<@ToxicFrog>
Although generally T_COMMA would be part of an enum, rather than a straight #define.
15:33
<@McMartin>
TF: It's flex. It does its own thing.
15:34
< Moltare>
And when it's a variable I get the lexer to assign yytext to stringval and then return T_VAR?
15:34
<@McMartin>
Right.
15:34
<@ToxicFrog>
McMartin: no, you need to provide them yourself.
15:34
< Moltare>
{ID} { stringval = yytext; return T_VAR; }
15:34
<@McMartin>
And then parse() needs to know that a T_VAR means you need to read stringval.
15:34
<@ToxicFrog>
That's what y.tab.h is for, but if you aren't using yacc, that doesn't get generated.
15:34
<@McMartin>
Aha
15:34 * McMartin has never used flex alone, so.
15:35
<@ToxicFrog>
So instead you need to write your own (say) tokens.h, and #include it in your lexer and parser
15:35
<@McMartin>
Aha.
15:35
< Moltare>
TF: Which is why I need to write the token.h file and #
15:35
< Moltare>
right
15:35
<@McMartin>
Anyway, he's right.
15:35
<@McMartin>
Instead of #define T_COMMA etc.
15:35
<@ToxicFrog>
And the contents are something like: enum Tokens { T_COMMA, T_SEMICOLON, T_OPENPAREN, T_STRING, T_INT, ..., T_NUMTOKENTYPES }
15:35
< Moltare>
I note I've never heard of enum; what's the distinction?
15:36
<@ToxicFrog>
Enum creates symbols rather than macros.
15:36
<@ToxicFrog>
#define is basically a global search-and-replace.
15:36
<@ToxicFrog>
Enum creates what are, in effect, constants with automatically assigned values.
15:39
< Moltare>
I thought global search-and-replace was what I was doing here
15:39
<@ToxicFrog>
...
15:39
< Moltare>
"When you see T_COMMA, read it as 145 and give that to the "type" variable"
15:40
< Moltare>
Or am I totally lost again?
15:40
<@ToxicFrog>
It is what you are doing with #define, yes
15:40
<@McMartin>
See, the idea here is that it's better to say "read it as something unique, I don't care what"
15:40
<@ToxicFrog>
As a general rule, though, you don't want to do that if you don't have to; and using an enum guarantees that all the values are unique without you having to worry about that, too.
15:44
< Moltare>
Fair enough; but then what goes in "tag" in McM's example struct above? since we're not giving them integer tag values
15:45
<@McMartin>
enums are secretly integers
15:45
<@McMartin>
What enum does is abstract out what the actual value is
15:46
<@McMartin>
(Also, "tag" in this case is actually what yylex() is returning)
15:46
< Moltare>
um
15:46
<@McMartin>
(when you return T_COMMA or what not, that value is assignable to an int variable)
15:47
< Moltare>
And flex knows to drop the value of T_COMMA into tag automatically?
15:48
<@McMartin>
I don't believe so, now that you mention it - I defer to TF on how the API actually works.
15:48
<@McMartin>
I used it merely as an example.
15:48
<@McMartin>
You'd have some *other* enum for expression types - and that's where tags would go and such
15:49
<@McMartin>
You'd just be reading return values and the global yytext value when interpreting lexemes.
15:49
<@ToxicFrog>
Alternately, have flex construct and return the token struct
15:49
<@McMartin>
Oh god, the memory management hassles =(
15:49
<@McMartin>
It's going to be bad enough with the AST.
15:50
<@ToxicFrog>
[0-9]+ { Token * tok = new_token(); tok.type = T_INTEGER; tok.value.intval = atol(yytext); return tok; }
15:51
<@ToxicFrog>
IME, this makes the code more clear while making memory management slightly trickier.
15:51
<@ToxicFrog>
But not hugely trickier; it's just callee-allocates, caller-frees.
15:52
<@McMartin>
With added ugliness if you need to duplicate yytext's values; you'll need to ensure that either all strvals are safe to free, or that none of them need to be.
15:52
< Moltare>
I note the lecturer specifically stated "don't bother freeing memory, it's not worth it for this"
15:52
<@ToxicFrog>
But yes. Flex doesn't know anything about what structures you're using, or tags, or anything.
15:52
<@ToxicFrog>
When it gets a match, it sets yytext to the actual text that matched, executes the corresponding code, and that's it.
15:53
<@McMartin>
Is yytext a global or an argument of some kind?
15:53
<@ToxicFrog>
Global. extern const char * yytext, IIRC.
15:56
< Moltare>
So if I do it that way, I don't have to play around with token.h? just define the token struct at the top of the lexer and have it return tokens which potentially have values added?
15:56
<@ToxicFrog>
You still need tokenh
15:56
<@ToxicFrog>
Otherwise, how does the parser tell what kind of token it is?
15:56
<@McMartin>
Otherwise the value T_INTEGER or whatnot won't exist.
15:57
< Moltare>
Right, but it'd just be a list of "This is T_INTEGER; it has an intval in it. This is T_COMMA; it has nothing in it. This is T_..."
15:58
<@McMartin>
Yeah.
15:58
<@McMartin>
And actually, the "it has a FOO in it" can be implicit.
15:58
<@ToxicFrog>
Er
15:58
<@ToxicFrog>
?
15:58
<@McMartin>
As long as it's unique, and parse() only ever reads the right value, Life Is Good.
15:58
<@ToxicFrog>
Woudn't it be a list of enums and a -single- struct-union definition?
15:59
<@McMartin>
Well, if you're making a Universal Token Type.
15:59
<@McMartin>
I'm imagining a case where you're communicating solely through a stream of globals and return values a la yytext.
15:59
< Moltare>
If the struct-union definition is in lexer.l's definitions, would you need it in token.h too?
15:59
<@McMartin>
It would probably be better to have it only be in token.h, unless there's some bizarre part of lexer.l I'm not grokking
16:00
<@ToxicFrog>
Moltare: it would be -only- in token.h
16:00
<@ToxicFrog>
Which is then #included by both the lexer and the parser
16:00
< Moltare>
oh, right
16:00
<@ToxicFrog>
Thus, they get the same definition for the token types, and for the layout of a Token struct, and they agree on everything
16:03
<@ToxicFrog>
http://lua.pastey.net/83554 -- a very simple example which assumes tokens only need to worry about int or string values (or no values)
16:03
<@ToxicFrog>
So, .tag is set to T_<something>, so that you can tell what kind of token a given Token struct is.
16:03
<@ToxicFrog>
And if that type has an associated value (say, T_INTEGER or T_STRING), the corresponding .value.<type>val is filled in.
16:04
< Moltare>
Makes sense
16:04
<@ToxicFrog>
So, the lexer can create and populate a Token struct appropriately for each token; and the parser can then look at that struct and figure out what kind it is and what value, if any, it has.
16:04
<@ToxicFrog>
(and then based on that the parser does the actual parsing thing)
16:05
< Moltare>
And do I need to define new_token() somewhere?
16:05
<@McMartin>
Yeah. That's essentially a one-liner
16:05
<@McMartin>
return malloc (sizeof (Token));
16:06
<@McMartin>
"Give me a chunk of uninitalized memory of this size"
16:06
<@McMartin>
If you want it to be zeroed by default, use calloc
16:06
<@ToxicFrog>
Token * new_token() { return malloc(sizeof(Token)); } /* create enough memory to hold a Token and return a pointer to it */
16:06
< Moltare>
Oh, right, the whole 'manage your own memory' thing
16:06
<@McMartin>
Any malloc()ed memory will need to be manually free()ed when you're done with it.
16:06
<@McMartin>
And for God's sake, only ever free() it once, and don't access it after it's been free()ed.
16:06
< Moltare>
It doesn't need free()ing, sez lecturer
16:07
<@ToxicFrog>
In this case you probably don't need to worry about that, because it's a short-running program and the OS will free it all when it exits.
16:07
<@ToxicFrog>
Which is what the lecturer is saying.
16:07
<@ToxicFrog>
(sidenote: if you have "Token * foo", you access its internals with "foo->tag", rather than "Token foo" and "foo.tag")
16:08
<@ToxicFrog>
(and since you are now playing with memory management and pointers, you will indeed have Token * foo)
16:10 Vornicus-Latens [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout]
16:15
< Moltare>
C doesn't natively do scientific notation, does it?
16:16
<@ToxicFrog>
Yes it does.
16:16
< Moltare>
Ah? handy
16:16
<@ToxicFrog>
double foo = 1.0e+06; /* compiles! */
16:16
< Moltare>
I tried to look it up but found no references that weren't C++ or C#
16:17
<@McMartin>
Is atof() smart enough to read those?
16:17
<@ToxicFrog>
Yes.
16:17
<@ToxicFrog>
Well, strtod() is
16:17
<@ToxicFrog>
And the atof man page says the behaviour is "identical to strtod except it does not report errors"
16:18
<@McMartin>
Ah yes, another grand C tradition.
16:18
<@McMartin>
In other news, gets() is still required by all conforming runtimes.
16:18 * ToxicFrog heads off to campus. Later!
16:18 * McMartin goes to perform his ablutions.
16:19
< Moltare>
hooray for stuff!
16:20
< Moltare>
(and thanks for your patience)
16:20
<@McMartin>
(flex ends up being an actually useful tool in its own right, for all kinds of stuff)
16:20
<@McMartin>
(Granted, in nearly all of these cases you should for the love of God not be writing it in C)
16:25 gnolam [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has quit [Ping Timeout]
16:26 gnolam [lenin@85.8.5.ns-20483] has joined #Code
16:26 mode/#code [+o gnolam] by ChanServ
16:30
< Moltare>
31 errors, woo
16:30
< Moltare>
Although 28 of them appear to be identical
16:30
<@McMartin>
Forgotten commas?
16:32
< Moltare>
No, it's complaining about enum token { T_BLAH... } and struct token { stuff here }
16:33
< Moltare>
Conflicting types, previous declarations, and something about not being able to return voids which I think might be a result of the first one. Also lots of whining about token which is clearly relating to the first issue
16:33
<@McMartin>
Ah, yes.
16:34
<@McMartin>
(I think you want typedef enum { T_BLAH... } TOKEN; and typedef struct token_struct { ... } token; )
16:37
< Moltare>
yay, different errors
16:37
< Moltare>
I don't even understand what the first one is saying, this time ¬¬
16:37
< Moltare>
Says, "In function `struct token * new_token()':"
16:38
< Moltare>
Or is that just "All of the following errors are here" or similar?
16:39
<@McMartin>
Yeah, that's "look here for what's going on"
16:39
< Moltare>
Then there's one "ANSI C++ forbids implicit conversion from `void *' in return", and a shitload of "request for member `type' in `tok', which is of non-aggregate type `token *'" and "return to `int' from `token *' lacks a cast
16:39
< Moltare>
"
16:40
<@McMartin>
OK, the shitload is of ou using .type instead of ->type
16:41
<@McMartin>
The ANSI C++ thing can DIAF and should only be a warning anyway, since you aren't *writing* C++...
16:41
<@McMartin>
If you want to get rid of it, make it return (token *)malloc (etc)
16:42
< Moltare>
ta
16:42
<@McMartin>
The "return to int from token *' lacks a cast" implies to me that your function declaration either forgot to declare a return type, or the people calling it don't know about its types
16:42
<@McMartin>
If the former, declare new_token as "token * new_token(void)"
16:43
<@McMartin>
If the latter, add the prototype "token *new_token(void);" - with the semicolon - to token.h
16:43
<@McMartin>
After the definition of the type
16:43
< Moltare>
(Is it token->value.intval or token->value->intval, idly?)
16:44
<@McMartin>
(token->value.intval, as value is not a pointer)
16:44
< Moltare>
(excellent, got something right)
16:47
< Moltare>
(with the result that I've only got the 'lacks a cast' ones left *fiddlefiddle*)
16:47
<@McMartin>
C assumes that any function it's never heard of returns an int and takes any number of untyped arguments
16:47
<@McMartin>
This is Always A Horrifically Bad Idea, so you need to type-declare them first in the header files.
16:49
< Moltare>
how random. Why int?
16:49
<@McMartin>
Because C's predecessor language only had two types; int and int*.
16:50
< Moltare>
Isn't that a bit... limiting?
16:50
<@McMartin>
With "int" defined as "whatever size you can shove into the hardware's register"
16:50
<@McMartin>
This would have been the late 60s/early 70s.
16:50
<@McMartin>
The idea that you could write "x+y*z" and have it work out operator precedence and assign temporary registers and stuff was still Hot Shit.
16:51
<@McMartin>
Though not *brand* new, the way it was for FORTRAN.
16:51
<@McMartin>
So called because it was a FORmula TRANSlator, and thus astonishing and new
16:53
< Moltare>
hm. Adding the prototype breaks the "forbids implicit conversion" thing again. And then removing it doesn't unbreak it.
16:53
< Moltare>
Za.
16:53
<@McMartin>
OK, the prototype should be there anyway
16:54
<@McMartin>
Where's the "implicit conversion" error?
16:54
<@McMartin>
And what's the line that produces it?
16:54
< Moltare>
I'd tell you, but it's gone again
16:55
<@McMartin>
Hmm.
16:55
< Moltare>
Also Dev-C++ is now telling me it's out of memory, and not letting me close it
16:55
<@McMartin>
whut
16:56 * Moltare process-kills it, starts it up again, shrugs
16:58
< Moltare>
Right, now it's back to only doing the lacks-a-cast thing
16:58
<@McMartin>
What line produces it?
16:58
< Moltare>
Any line of the .l file that attempts to return a token
16:59
< Moltare>
"if" { token * tok = new_token(); tok->type = T_IF; return tok; } and its ilk
16:59
< Moltare>
(hence 26 of the original 31 errors being identical)
16:59
<@McMartin>
Aha
16:59
<@McMartin>
This sounds like yylex() isn't being prototyped.
16:59
<@McMartin>
Maybe add a token *yylex(void); to token.h too?
16:59
<@McMartin>
I'm stabbing in the dark here
17:00
<@McMartin>
If yylex() has a forced prototype, then you're kind of screwed
17:01
< Moltare>
15 tokens.h
17:01
< Moltare>
ambiguates old declaration `struct token * yylex()'
17:01
< Moltare>
New and shiny extra error from that
17:01
<@McMartin>
Where was the old declaration?
17:01
< Moltare>
I never made the old declaration
17:01
< Moltare>
PResumably flex did it for me
17:02
<@McMartin>
Hm. Somewhere in lexer.l you've defined "struct token *"
17:02
<@McMartin>
Turn that to just "token *" if you did the typedef
17:03
< Moltare>
I've not defined "struct token *" anywhere
17:03
<@McMartin>
Hum
17:03
<@McMartin>
Can you paste lexer.l somewhere?
17:04
< Moltare>
Certainly
17:05
< Moltare>
Can't send the url
17:05
< Moltare>
oh, wate
17:05
< Moltare>
no voice :P
17:05
< Moltare>
(pm'd)
17:13 You're now known as TheWatcher[afk]
17:19 mode/#code [+v Moltare] by ChanServ
17:29
<+Moltare>
Hashtable next, then!
17:29
<+Moltare>
- stores variables
17:29
<+Moltare>
- has place(thing) and find(thing)
17:30
<+Moltare>
if find(thing) fails, uses place(thing)
17:30
<+Moltare>
- is basically just like I'd do it in Java?
17:30
<+Moltare>
oh, and
17:30
<+Moltare>
- goes in the parser .c file?
17:40 * Moltare makes a shoddy first draft, leaves it for now
17:41
< C_tiger>
Mol: if you haven't got the regex: \/\*(.*?)\*\/ will work for you
17:41
<+Moltare>
I got it, but thank you all the same :)
17:42
< C_tiger>
Yeah, there was a little too much upscroll to read.
17:43
< C_tiger>
But that's literal /* (any character as many times as needed but MINIMAL number of times so the rest of the regex fits) literal */
17:43
< C_tiger>
parentheses unnecessary.
17:51 Vornotron [~vorn@Admin.Nightstar.Net] has joined #code
17:57
<@McMartin>
Also, for the record, from the PM discussion
17:58
<@McMartin>
If you're using flex and you want to return something that isn't an integer, you have to #define YYDECL to be the (semicolon-free) prototype for your parser function.
17:59
<@McMartin>
Otherwise you'll get type conflicts, which are The Lose
18:39 Vornotron [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout]
18:46
<@AnnoDomini>
Hm. Would anyone know where I could find an implementation of the Bresenham algorithm in assembly? Preferrably x86, but most anything will do.
18:47 Vornotron [~vorn@Admin.Nightstar.Net] has joined #code
18:59 You're now known as TheWatcher
19:04 * AnnoDomini will try converting the pseudocode from Wikipedia, then.
19:06
< Vornotron>
What's the subject?
19:06
<@AnnoDomini>
Bresenham line algorithm in assembly.
19:06
< Vornotron>
Aha
19:08
<@AnnoDomini>
We're supposedly given a pseudocode for it in the materials for the class, but it looks to be the basic version, which won't help me.
19:08
<@AnnoDomini>
And it doesn't work, either.
19:17
<@McMartin>
I have C code for it in my first edition Graphics Gems book.
19:18
<@McMartin>
I believe I used it to create an assembler Bresenham's for the C64.
19:18
< Vornotron>
Bresenham is pretty easy, in the end.
19:18
<@McMartin>
Which is the wrong chip. =P
19:18
<@McMartin>
Yes.
19:18
<@McMartin>
SDL_gfx also has an implementation of it, I believe.
20:02 Vornotron is now known as Finerty
20:08
<@MyCatVerbs>
McMartin: type conflicts in _C_ of all language are Double Lose, with Extra Lose on the Side.
20:11 Attilla [~The.Attil@194.72.70.ns-11849] has quit [Quit: <Insert Humorous and/or serious exit message here>]
20:14
<@ToxicFrog>
Moltare: concerning the hash table: typically this would go in a seperate file, say, hash.c
20:15
<+Moltare>
Oh, which I then #include in parser.c?
20:15
<@ToxicFrog>
It gets a corresponding header, hash.h, which other files that use it #include (and contains function declarations and suchlike)
20:15
<@ToxicFrog>
No, you #include the hash.h
20:15
<@ToxicFrog>
The actual function _code_ goes in hash.c
20:15
<@ToxicFrog>
Which then gets combined with the rest of the program at link time.
20:16
<@ToxicFrog>
(also: it occurs to me that if you're using Dev-C++, the "ANSI C++ forbids..." warnings might be because it's trying to compile it as C++; double check your project settings)
20:17 * gnolam ponders launching into one of his anti-Dev-C++ rants again.
20:17 * GeekSoldier gets the popcorn.
20:17
<@ToxicFrog>
gnolam: hold it until after Mol is done with his homework, please?
20:18
<@ToxicFrog>
And for the record, I suggested just using gcc/MSYS directly, since this is a small project.
20:18
<@ToxicFrog>
Moltare: anyways. The idea is that a .c file holds actual code. A .h file holds struct, enum, and function declarations, #defines, and whatnot - everything that other .c files need in order to make use of that code.
20:19
<@ToxicFrog>
At build time, all the .c become .o (object code), and those are all combined into your executable or library.
20:22
<@gnolam>
McMartin: you wouldn't happen to have code for Bresenham ellipses in there somewhere as well?
20:22
<@McMartin>
(The key difference here is that C, unlike Java or Python, compiles each code file in a vaccuum)
20:22
<@McMartin>
gnolam: I have no idea; the book is buried somewhere in my closet
20:27 Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has quit [Ping Timeout]
20:27 Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has joined #code
20:32 Attilla [~The.Attil@194.72.70.ns-11849] has joined #code
20:33 Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has quit [Ping Timeout]
20:40 Moltare [~moltare@82.32.73.ns-25785] has joined #code
20:42
<@ToxicFrog>
Wibs.
20:42
<@AnnoDomini>
Question - what flag signifies the presence of a negative number in the x86 architecture?
20:43
<@AnnoDomini>
SF?
20:43
<@AnnoDomini>
Can't seem to find anything on x86 flags.
20:45
<@ToxicFrog>
Yes, it's SF
20:46
<@ToxicFrog>
Stands for "Sign Flag"
21:52
<@AnnoDomini>
Damn it. Conditional statements are such a pain in assembly. :/
21:54
<@ToxicFrog>
It's just branches.
21:54
<@McMartin>
GOTO, M-Fer! DO YOU SPEAK IT!
21:55 * AnnoDomini laughs.
21:57
<@gnolam>
They say JMP and you say "How high?".
22:00
<@AnnoDomini>
Ohgodaforloop.
22:01
<@ToxicFrog>
Many processors have an instruction specifically for doing for loops.
22:01
<@ToxicFrog>
Even in the ones that don't it's usually pretty straightforward.
22:02
<@AnnoDomini>
Won't work for me here, as I need the counter to increase, rather than decrease. It'll just be easier to make it myself.
22:02
<@ToxicFrog>
So it decomposes into INC, CMP, BRA
22:15 Moltare [~moltare@82.32.73.ns-25785] has quit [Ping Timeout]
22:35
<@AnnoDomini>
Haaaate.
22:35
<@AnnoDomini>
Turns out some bastard put a nonworking implementation on Wikipedia.
22:35
<@AnnoDomini>
And nobody who tried to implement it has bothered to put a notice near the code.
23:12
<@Reiver>
...what
23:15
<@AnnoDomini>
Ah, excellent. This implementation actually looks like it works. I do seem to have some bugs in this code, though, as every time I run it, it crashes DOSBox.
23:23
<@AnnoDomini>
What I don't like is that I don't understand the compiler directives in use here. It's what the lecturer used, but damn him, he didn't explain it very well.
23:32 Finerty is now known as Vornicus
23:46
<@AnnoDomini>
Bug found - forgot RET at the end of subroutine.
23:47
<@AnnoDomini>
AWESOME.
23:47
<@AnnoDomini>
It actually works.
23:53 GeekSoldier is now known as GeekSoldier|bed
--- Log closed Mon Mar 10 00:00:14 2008
code logs -> 2008 -> Sun, 09 Mar 2008< code.20080308.log - code.20080310.log >