--- Log opened Sun Mar 09 00:00:04 2008 |
00:23 | | GeekSoldier|bed [~Rob@91.18.86.ns-26604] has quit [Ping Timeout] |
00:24 | | GeekSoldier|bed [~Rob@91.18.86.ns-26604] has joined #code |
00:29 | | GeekSoldier|bed [~Rob@91.18.86.ns-26604] has quit [Ping Timeout] |
00:29 | | AnnoDomini [AnnoDomini@83.21.32.ns-4025] has quit [Quit: (...) By this point, the astute reader has picked up that Nethack isn't a "game" as much as an extremely prolonged and extremely elaborate form of masochism. Ask any serious player.] |
01:30 | | Vornicus [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout] |
01:31 | | Vornotron [~vorn@Admin.Nightstar.Net] has joined #code |
01:31 | | You're now known as TheWatcher |
04:50 | | You're now known as TheWatcher[zZzZ] |
04:53 | | Vornotron is now known as Vornicus |
04:54 | | Vornicus is now known as NSGuest-5480 |
04:55 | | NSGuest-5480 is now known as Vornicus |
05:00 | | * Reiver finally clicks as to what the hell a tuple really is. |
05:00 | | * Reiver can't believe he'd struggled with the concept, given it is distinctly 'Durrr' stuff. >.< |
05:12 | < Vornicus> | Heh |
05:16 | | Reiver is now known as ReivShoppin |
05:16 | <@ReivShoppin> | Seriously! |
05:17 | <@ReivShoppin> | "A row of data" |
05:17 | <@ReivShoppin> | It'd had me puzzled in Python for ages >.> |
05:27 | < Vornicus> | *snrk* |
05:34 | | Thaqui [~Thaqui@Nightstar-123.jetstream.xtra.co.nz] has joined #code |
05:34 | | mode/#code [+o Thaqui] by ChanServ |
06:38 | | ReivShoppin is now known as Reiver |
07:00 | | GeekSoldier|bed [~Rob@Nightstar-8762.dip.t-dialin.net] has joined #code |
07:03 | | Vornicus [~vorn@ServicesOp.Nightstar.Net] has quit [Ping Timeout] |
07:04 | | GeekSoldier|bed is now known as GeekSoldier |
07:07 | | Vornicus [~vorn@Admin.Nightstar.Net] has joined #code |
07:07 | | mode/#code [+o Vornicus] by ChanServ |
07:34 | | AnnoDomini [AnnoDomini@83.21.32.ns-4025] has joined #Code |
07:34 | | mode/#code [+o AnnoDomini] by ChanServ |
07:36 | | Vornicus is now known as Vornicus-Latens |
08:05 | <@jerith> | Reiver: I always thought of it as "a read-only list". |
08:05 | <@Reiver> | jerith: Yeah, well, I'd been trying to get my head around the concept. |
08:05 | <@Reiver> | Now I do, problem solved~ |
08:07 | <@jerith> | :-) |
09:00 | | GeekSoldier [~Rob@Nightstar-8762.dip.t-dialin.net] has quit [Ping Timeout] |
09:09 | | Thaqui [~Thaqui@Nightstar-123.jetstream.xtra.co.nz] has left #code [Leaving] |
09:41 | | GeekSoldier [~Rob@Nightstar-9089.dip.t-dialin.net] has joined #code |
09:43 | | gnolam [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has joined #Code |
09:43 | | mode/#code [+o gnolam] by ChanServ |
09:54 | | Brother_Willibald [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has joined #Code |
09:55 | | Brother_Willibald [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has quit [Quit: *poof*] |
10:20 | | You're now known as TheWatcher |
11:30 | | AnnoDomini [AnnoDomini@83.21.32.ns-4025] has quit [Ping Timeout] |
11:31 | | AnnoDomini [AnnoDomini@83.21.28.ns-26444] has joined #Code |
11:31 | | mode/#code [+o AnnoDomini] by ChanServ |
12:29 | | eXeLaNCe [~dddd@88.245.15.ns-13237] has joined #code |
13:03 | | eXeLaNCe [~dddd@88.245.15.ns-13237] has quit [Quit: ] |
13:08 | < Moltare> | Idly, lads, I'm looking for the regex that means "Anything, including spaces, tabs and newlines, that comes between /* and */" |
13:08 | < Moltare> | I thought it was "/*"[. \t\n]*"*/" , but that doesn't seem to cut the mustard |
13:08 | <@McMartin> | It doesn't because it's allowing */s in the middle of it. |
13:09 | <@McMartin> | That said |
13:09 | <@McMartin> | I seem to recall that Tiger allows nested comments, so you're going to have to be more cunning about this |
13:09 | < Moltare> | Tiger doesn't |
13:09 | < Moltare> | Or, wate |
13:09 | < Moltare> | Tiger does, but I'm not trying to build a lexer for Tiger |
13:10 | < Moltare> | It's for a dodgy homebrew that our lecturer made up |
13:13 | <@McMartin> | Ah, OK |
13:13 | <@McMartin> | The problem you've hit is that [. \t\n]* means "as many of any character you can munch" |
13:13 | <@McMartin> | That's the entire file |
13:13 | <@McMartin> | You need to exclude the "*/" sequence from that middle bit. |
13:14 | < Moltare> | Well, my /current/ problem is that it doesn't find a "/*" at all; it finds a division operator followed by a multiplication operator |
13:14 | <@McMartin> | Aha. |
13:14 | < Moltare> | Despite the quotes |
13:14 | <@McMartin> | You need to put the comment-matcher higher up in the flex file so it will have higher priority. |
13:15 | < Moltare> | It was already above the others |
13:15 | | * Moltare puts it at the top instead |
13:15 | <@McMartin> | Hrm. |
13:15 | <@McMartin> | Maybe it then needs to be at the bottom~ |
13:22 | | * Moltare fiddle |
13:24 | < Moltare> | So, using "/*"[a-zA-Z0-9 \t\n]*"*/" to limit it to alphanumeric characters for the moment |
13:25 | < Moltare> | It recognises /* */ as a comment, but not /* comment */ |
13:26 | | * GeekSoldier tries to remember... "/blah/s"? |
13:28 | <@McMartin> | GeekSoldier: This is flex, not Perl |
13:28 | <@McMartin> | Moltare: OK, that boggles me |
13:29 | < GeekSoldier> | oh. |
13:29 | | * GeekSoldier returns to his corner. |
13:30 | <@McMartin> | Maybe it needs spaces between the "s and the []s? |
13:32 | < Moltare> | no appreciable difference |
13:34 | <@McMartin> | Does the space need to be escaped? |
13:34 | < Moltare> | It doesn't for the "ignore spaces, tabs and newlines" entry |
13:34 | <@McMartin> | Blarghlecopter. |
13:35 | <@McMartin> | What does it think of /*c*/? |
13:36 | <@McMartin> | I'm wondering if it's somehow getting boggled by more than one character or something |
13:37 | < Moltare> | Breaks in the exact same way |
13:37 | < Moltare> | /**/ and /* */ are fine, /*c*/ and /* c */ are not |
13:37 | <@McMartin> | How about /* */? |
13:38 | < Moltare> | Fine |
13:39 | <@McMartin> | /* 123 */? |
13:39 | < Moltare> | Not |
13:40 | <@McMartin> | This is several varieties of aggravating |
13:40 | <@McMartin> | The TeXInfo implies that it should work |
13:40 | <@McMartin> | What happens if you remove the 0-9 and try your test cases? |
13:41 | < Moltare> | Same |
13:41 | <@McMartin> | In case this is some bizarre heinousness where - works for letters but not numbers |
13:41 | <@McMartin> | Rargh. |
13:41 | <@McMartin> | OK |
13:41 | | * Moltare ponders, deletes everything but the .l file, recompiles from scratch just in case |
13:43 | < Moltare> | Ah, now it won't compile. This indicates progress of a backwards sort |
13:43 | <@McMartin> | Gnrk. |
13:43 | <@McMartin> | What's the error? |
13:44 | < Moltare> | FXD |
13:44 | < Moltare> | Spaces where they shouldn't be |
13:45 | < Moltare> | /**/: works /* */: works /*c*/: works /* c */: works |
13:45 | <@McMartin> | ... OK. |
13:45 | <@McMartin> | And now /* 123 */ won't. |
13:45 | < Moltare> | And the compiler was reusing an old version of something |
13:46 | < Moltare> | I put that back in, McM |
13:46 | <@McMartin> | Aha. |
13:46 | <@McMartin> | Good times then |
13:47 | < Moltare> | Now I need to change it from "alphanumeric characters" to "anything that isn't */", I take it |
13:48 | <@McMartin> | Right. |
13:48 | <@ToxicFrog> | Yes. |
13:48 | <@McMartin> | But, of course, /********/ needs to be legal. |
13:48 | <@McMartin> | So you can't just do [^*]* |
13:50 | < Moltare> | What about [^"*/"]*? Or will that just compare every character to */ and therefore never fire? |
13:51 | <@ToxicFrog> | That is "everything but ", *, and /", except that since flex uses " as well it probably won't work period |
13:51 | < Moltare> | ah |
13:52 | <@ToxicFrog> | I'm not entirely sure it's possible to handle /* comments */ using just regexes |
13:52 | <@McMartin> | It is. |
13:52 | <@McMartin> | You have to be a rat bastard about it, but it's doable. |
13:52 | <@McMartin> | What *isn't* doable with raw regex is nested comments. |
13:53 | <@McMartin> | Flex can do it by abusing state variables to make it limited-context-free, but. |
13:53 | < Moltare> | My rat bastardry skills are weak, as you may have noted ¬¬ |
13:54 | <@McMartin> | Basically, * is allowed *as long as it isn't followed by a slash*. |
13:55 | <@McMartin> | And you know how to say "something that isn't a slash" |
13:57 | < Moltare> | So it's ((^*|^/)|(*^/))* ? |
13:57 | < Moltare> | Not an asterisk or slash, or asterisk as long as a slash doesn't follow |
13:58 | <@McMartin> | That doesn't look remotely like flex syntax |
13:58 | < Moltare> | I used the wrong shape brackets there ¬¬ |
13:59 | <@McMartin> | ^ outside of the square brackets means "match the beginning of a line" |
13:59 | <@McMartin> | Also, /*////////*/ is an acceptable comment. |
14:01 | < Moltare> | [[^*|^/]|[*^/]|[^*/]]* ? That just complains at me ¬¬ |
14:04 | <@McMartin> | Yeah, you can't nest []s. |
14:05 | < Moltare> | How irritating. |
14:05 | <@McMartin> | [] is a special case in its own right. |
14:05 | <@McMartin> | [^*|^*/] is not an OR. |
14:05 | <@McMartin> | If you want "neither * nor /" that's [^*/]. |
14:05 | < Moltare> | Is that not "not *, followed by /"? |
14:06 | <@McMartin> | No. |
14:06 | <@McMartin> | Because [^*/] matches a single character. |
14:06 | <@McMartin> | Specifically, any character that is not * or /. |
14:06 | <@McMartin> | It's equivalent to [^/*]. |
14:06 | < Moltare> | Alright, then |
14:08 | < Moltare> | So how do I create ( /* followed by (either not * and not /, or * that isn't followed by /, or / that isn't preceded by *) an arbitrary number of times followed by */ ), then? |
14:08 | < Moltare> | (The lex manual I have here claims that | is an OR, idly) |
14:08 | <@McMartin> | Well, "/*" followed by (something) is easy. |
14:08 | <@McMartin> | Yes, | is indeed or. |
14:09 | <@McMartin> | However, when part of a [] token, | is "the vertical bar character" |
14:09 | <@McMartin> | Also, your last bit of the spec is wonky. |
14:09 | <@McMartin> | /* /* */ is a valid comment. |
14:09 | <@McMartin> | /* /* */ */ is a valid comment followed by a * and a /. |
14:10 | <@McMartin> | All that said |
14:11 | < Moltare> | /* /* */ fits, surely? It's /*, followed by / that is preceded by a space, followed by * that is followed by a space, followed by */ |
14:11 | <@McMartin> | You can use () to group stuff up |
14:11 | <@McMartin> | Oh, I see, I missed the "preceded" |
14:11 | <@McMartin> | Try a rephrase. |
14:12 | <@McMartin> | "Either a single character that isn't *, or a * followed by something that isn't /..." |
14:13 | < Moltare> | "/*"(^*|*^/)*"*/" is what I'd got it down to |
14:13 | <@McMartin> | Close. |
14:13 | <@McMartin> | You're missing some []s in strategic locations. |
14:13 | <@McMartin> | And possibly some ""s. |
14:14 | < Moltare> | ¬¬ Much as I appreciate the help, I begin to see why asking for it drives Jaci insane. I'm not looking to learn, here, I just want the damn thing working so I can put it in my past :P |
14:14 | | * Moltare fiddle some more, then |
14:15 | <@McMartin> | Moltare: And I've TAed this very class twice, and so I am deliberately nerfing myself, acting as if you were somebody wandering into my office hours. |
14:15 | <@McMartin> | I rather suspect this isn't going to help the attitude problems much. |
14:15 | < Moltare> | heh |
14:16 | < Moltare> | Victory, all the same |
14:16 | <@McMartin> | Good show |
14:17 | <@McMartin> | I'm afraid I can't be a lot of help with a C-based recursive descent parser, though I can point you at ones that I wrote in Java and OCaml. The principles should be similar. =P |
14:19 | | * Moltare applies hard-won knowledge, fixes his string definition into the bargain |
14:21 | < Moltare> | Ahh.. or not. Because a string "foo" currently reports as a string with value "foo" rather than a string with value foo... |
14:21 | | * Moltare attempts to solo this one, first |
14:23 | | * McMartin goes to deal with breakfast |
14:35 | < Moltare> | Doesn't help that every time I go to write 'lexer' I write 'lever' |
14:35 | < Moltare> | Did it that time too |
14:44 | < Moltare> | Lunch! |
15:06 | | * gnolam snerks. |
15:06 | <@gnolam> | http://www.imdb.com/name/nm2469945/ |
15:09 | <@McMartin> | Hmm, and because I seem to have neglected to quote it in here: |
15:09 | <@McMartin> | "The defense grid can be full of lasery doom. The defense grid is not full of lasery doom." |
15:12 | | * Moltare replaces his printfs with something more useful to the nascent parser |
15:13 | < Moltare> | Understanding check, plz? |
15:13 | < Moltare> | A rule should return a T_SOMEKINDOFTOKEN and possibly an associated yylval |
15:14 | <@McMartin> | Urgh. I haven't used flex proper in long enough to be able to answer that with confidence. |
15:14 | < Moltare> | There is also a .h file, whatever one of those is, that lists structures of T_ALLTHETOKENS |
15:14 | <@McMartin> | That sounds about right. |
15:14 | < Moltare> | ie type, value |
15:14 | <@McMartin> | Yeah |
15:15 | < Moltare> | Then the parser itself calls the yyparse() thing generated in yy.lex.c by flex, breaks it into left,right¢re and recurses it in the face |
15:16 | < Moltare> | erm, yylex() thing |
15:16 | <@McMartin> | I don't recall if the actual token return ends up in a global too or not |
15:16 | < Moltare> | and a hash table is involved to check if variables are present or not |
15:17 | <@McMartin> | Well, that's your doing, not flex's. |
15:17 | < Moltare> | The hash table? yes |
15:17 | < Moltare> | I've got a fragment of code here: struct token { char *lexeme; int type; int value; }, but no idea what I'm supposed to be doing with it ¬¬ |
15:19 | <@McMartin> | At this point you're in "what your assignment is" territory and we're unlikely to be a lot of help. |
15:19 | <@McMartin> | (By which I mean "involving the spec of the assignment", not "we aren't going to do your homework for you") |
15:21 | < Moltare> | As far as I've worked it out: parser.c has the parse() method which does the actual parsing, and #includes a tokens.h file and a lex.yy.c file. |
15:21 | < Moltare> | lex.yy.c is what flex creates. |
15:22 | <@McMartin> | Right. |
15:22 | < Moltare> | tokens.h defines the token structure globally as having a type and possibly a value, and lists the type for a given token |
15:22 | <@McMartin> | And presumably, right now your parse() is just reading the stream and dumping it? |
15:22 | < Moltare> | (as a big column of #define T_COMMA 125; etc |
15:22 | <@McMartin> | Right |
15:23 | < Moltare> | Right now I have no parse(), as I've only just got the lexer putting out stuff on command |
15:23 | <@McMartin> | OK. |
15:23 | < Moltare> | That, I think, is step 1 |
15:23 | <@McMartin> | Aye. |
15:23 | < Moltare> | Get the tokens.h file and make it play nicely with a basic parse() method |
15:23 | <@McMartin> | So, parse() is going to be taking the output of the lexer as a stream of tokens, and turning it into some kind of (probably tree-recursive) structure. |
15:24 | < Moltare> | recursive-descent, as specified |
15:24 | < Moltare> | (in my spec, that is, not 'as I have already mentioned') |
15:24 | <@McMartin> | Well, that's the parser's implementation |
15:24 | <@McMartin> | By "tree-recursive" I mean that you're producing a list of Expressions or whatnot |
15:24 | < Moltare> | Oh. |
15:24 | < Moltare> | Yes. |
15:24 | <@McMartin> | And Expressions themselves can be made of expressions. |
15:25 | <@McMartin> | Have you done anything with unions in C? |
15:25 | < Moltare> | No, and I note that your use of "with unions" is superfluous. |
15:25 | < Moltare> | I have never touched C before this assignment. ¬¬ |
15:26 | <@McMartin> | OK, so. |
15:26 | <@McMartin> | A union is sort of like a struct, except that all of the members overlap. |
15:26 | <@McMartin> | This lets you do horrifically awful things to memory, much like everythign else in C. |
15:26 | <@McMartin> | More to the point, it's a way to get Polymorphism. |
15:26 | <@McMartin> | You go, say: |
15:26 | <@McMartin> | struct TOKEN { |
15:26 | <@McMartin> | int tag; |
15:26 | <@McMartin> | union { |
15:26 | <@McMartin> | char * stringval; |
15:26 | <@McMartin> | int intval |
15:26 | <@McMartin> | } value; |
15:27 | <@McMartin> | }; |
15:27 | < Moltare> | OH, right |
15:27 | < Moltare> | So it can be either |
15:27 | < Moltare> | (what does the * represent?) |
15:27 | <@McMartin> | And then if you access the wrong value of the union you corrupt memory and possibly bring down the entire machine |
15:27 | <@McMartin> | "address of previous type" |
15:27 | <@McMartin> | C has no concept of strings. |
15:28 | <@McMartin> | Instead, you use an address of a character, and hope and pray that there is a null byte at an appropriate point in the future. |
15:28 | <@gnolam> | Eh, the real usefulness of unions lies in serialization. |
15:28 | <@gnolam> | IMO. |
15:28 | < Moltare> | The more I hear of C, the more I wonder why everyone hates Java so much ¬¬ it seems to be far more intent on exsanguinating you |
15:28 | <@McMartin> | As an ML partisan, I beg to differ. They're for implementing Constructor types. |
15:29 | <@McMartin> | Moltare: C partisans feel that Java's inability to completely fuck you over for the tiniest mistake is an unconscionable assault on their freedom as a programmer. |
15:29 | <@ToxicFrog> | Moltare: C, unlike Java, is useful for implementing kernels, device drivers, and other low-level-but-we-don't-want-to-write-this-in-asm stuff. |
15:29 | <@McMartin> | And yes, said freedom is actually necessary for direct hardware control. |
15:30 | <@ToxicFrog> | The same features that make it useful for that also make it insanely dangerous, though~ |
15:30 | <@McMartin> | That said, when your professor said that this would be vastly easier in C, he was lying through his teeth. I suspect his actual intent was to make you actually implement stuff on your own instead of just handing it over to library classes. |
15:30 | <@McMartin> | Like, you know, String. |
15:30 | <@McMartin> | And HashMap. |
15:31 | <@ToxicFrog> | Quite. |
15:32 | < Moltare> | So, having created our token structure and given the appropriate #define T_COMMA someintegervalue in the token.h file, I then get the lexer to return T_COMMA when it hits "," in the program you hand it |
15:32 | < Moltare> | "," { return T_COMMA; } sort of thing |
15:33 | <@McMartin> | (Also, less hostilely, because the Java version of the Tiger book uses a totally different technique than the C/ML version, revolving around Visitors) |
15:33 | <@McMartin> | Mol: That sounds about right, yes. |
15:33 | <@McMartin> | IIRC, calls to yylex() will assign some global structure that will let you get the juicy datameats out once this is done |
15:33 | <@ToxicFrog> | Although generally T_COMMA would be part of an enum, rather than a straight #define. |
15:33 | <@McMartin> | TF: It's flex. It does its own thing. |
15:34 | < Moltare> | And when it's a variable I get the lexer to assign yytext to stringval and then return T_VAR? |
15:34 | <@McMartin> | Right. |
15:34 | <@ToxicFrog> | McMartin: no, you need to provide them yourself. |
15:34 | < Moltare> | {ID} { stringval = yytext; return T_VAR; } |
15:34 | <@McMartin> | And then parse() needs to know that a T_VAR means you need to read stringval. |
15:34 | <@ToxicFrog> | That's what y.tab.h is for, but if you aren't using yacc, that doesn't get generated. |
15:34 | <@McMartin> | Aha |
15:34 | | * McMartin has never used flex alone, so. |
15:35 | <@ToxicFrog> | So instead you need to write your own (say) tokens.h, and #include it in your lexer and parser |
15:35 | <@McMartin> | Aha. |
15:35 | < Moltare> | TF: Which is why I need to write the token.h file and # |
15:35 | < Moltare> | right |
15:35 | <@McMartin> | Anyway, he's right. |
15:35 | <@McMartin> | Instead of #define T_COMMA etc. |
15:35 | <@ToxicFrog> | And the contents are something like: enum Tokens { T_COMMA, T_SEMICOLON, T_OPENPAREN, T_STRING, T_INT, ..., T_NUMTOKENTYPES } |
15:35 | < Moltare> | I note I've never heard of enum; what's the distinction? |
15:36 | <@ToxicFrog> | Enum creates symbols rather than macros. |
15:36 | <@ToxicFrog> | #define is basically a global search-and-replace. |
15:36 | <@ToxicFrog> | Enum creates what are, in effect, constants with automatically assigned values. |
15:39 | < Moltare> | I thought global search-and-replace was what I was doing here |
15:39 | <@ToxicFrog> | ... |
15:39 | < Moltare> | "When you see T_COMMA, read it as 145 and give that to the "type" variable" |
15:40 | < Moltare> | Or am I totally lost again? |
15:40 | <@ToxicFrog> | It is what you are doing with #define, yes |
15:40 | <@McMartin> | See, the idea here is that it's better to say "read it as something unique, I don't care what" |
15:40 | <@ToxicFrog> | As a general rule, though, you don't want to do that if you don't have to; and using an enum guarantees that all the values are unique without you having to worry about that, too. |
15:44 | < Moltare> | Fair enough; but then what goes in "tag" in McM's example struct above? since we're not giving them integer tag values |
15:45 | <@McMartin> | enums are secretly integers |
15:45 | <@McMartin> | What enum does is abstract out what the actual value is |
15:46 | <@McMartin> | (Also, "tag" in this case is actually what yylex() is returning) |
15:46 | < Moltare> | um |
15:46 | <@McMartin> | (when you return T_COMMA or what not, that value is assignable to an int variable) |
15:47 | < Moltare> | And flex knows to drop the value of T_COMMA into tag automatically? |
15:48 | <@McMartin> | I don't believe so, now that you mention it - I defer to TF on how the API actually works. |
15:48 | <@McMartin> | I used it merely as an example. |
15:48 | <@McMartin> | You'd have some *other* enum for expression types - and that's where tags would go and such |
15:49 | <@McMartin> | You'd just be reading return values and the global yytext value when interpreting lexemes. |
15:49 | <@ToxicFrog> | Alternately, have flex construct and return the token struct |
15:49 | <@McMartin> | Oh god, the memory management hassles =( |
15:49 | <@McMartin> | It's going to be bad enough with the AST. |
15:50 | <@ToxicFrog> | [0-9]+ { Token * tok = new_token(); tok.type = T_INTEGER; tok.value.intval = atol(yytext); return tok; } |
15:51 | <@ToxicFrog> | IME, this makes the code more clear while making memory management slightly trickier. |
15:51 | <@ToxicFrog> | But not hugely trickier; it's just callee-allocates, caller-frees. |
15:52 | <@McMartin> | With added ugliness if you need to duplicate yytext's values; you'll need to ensure that either all strvals are safe to free, or that none of them need to be. |
15:52 | < Moltare> | I note the lecturer specifically stated "don't bother freeing memory, it's not worth it for this" |
15:52 | <@ToxicFrog> | But yes. Flex doesn't know anything about what structures you're using, or tags, or anything. |
15:52 | <@ToxicFrog> | When it gets a match, it sets yytext to the actual text that matched, executes the corresponding code, and that's it. |
15:53 | <@McMartin> | Is yytext a global or an argument of some kind? |
15:53 | <@ToxicFrog> | Global. extern const char * yytext, IIRC. |
15:56 | < Moltare> | So if I do it that way, I don't have to play around with token.h? just define the token struct at the top of the lexer and have it return tokens which potentially have values added? |
15:56 | <@ToxicFrog> | You still need tokenh |
15:56 | <@ToxicFrog> | Otherwise, how does the parser tell what kind of token it is? |
15:56 | <@McMartin> | Otherwise the value T_INTEGER or whatnot won't exist. |
15:57 | < Moltare> | Right, but it'd just be a list of "This is T_INTEGER; it has an intval in it. This is T_COMMA; it has nothing in it. This is T_..." |
15:58 | <@McMartin> | Yeah. |
15:58 | <@McMartin> | And actually, the "it has a FOO in it" can be implicit. |
15:58 | <@ToxicFrog> | Er |
15:58 | <@ToxicFrog> | ? |
15:58 | <@McMartin> | As long as it's unique, and parse() only ever reads the right value, Life Is Good. |
15:58 | <@ToxicFrog> | Woudn't it be a list of enums and a -single- struct-union definition? |
15:59 | <@McMartin> | Well, if you're making a Universal Token Type. |
15:59 | <@McMartin> | I'm imagining a case where you're communicating solely through a stream of globals and return values a la yytext. |
15:59 | < Moltare> | If the struct-union definition is in lexer.l's definitions, would you need it in token.h too? |
15:59 | <@McMartin> | It would probably be better to have it only be in token.h, unless there's some bizarre part of lexer.l I'm not grokking |
16:00 | <@ToxicFrog> | Moltare: it would be -only- in token.h |
16:00 | <@ToxicFrog> | Which is then #included by both the lexer and the parser |
16:00 | < Moltare> | oh, right |
16:00 | <@ToxicFrog> | Thus, they get the same definition for the token types, and for the layout of a Token struct, and they agree on everything |
16:03 | <@ToxicFrog> | http://lua.pastey.net/83554 -- a very simple example which assumes tokens only need to worry about int or string values (or no values) |
16:03 | <@ToxicFrog> | So, .tag is set to T_<something>, so that you can tell what kind of token a given Token struct is. |
16:03 | <@ToxicFrog> | And if that type has an associated value (say, T_INTEGER or T_STRING), the corresponding .value.<type>val is filled in. |
16:04 | < Moltare> | Makes sense |
16:04 | <@ToxicFrog> | So, the lexer can create and populate a Token struct appropriately for each token; and the parser can then look at that struct and figure out what kind it is and what value, if any, it has. |
16:04 | <@ToxicFrog> | (and then based on that the parser does the actual parsing thing) |
16:05 | < Moltare> | And do I need to define new_token() somewhere? |
16:05 | <@McMartin> | Yeah. That's essentially a one-liner |
16:05 | <@McMartin> | return malloc (sizeof (Token)); |
16:06 | <@McMartin> | "Give me a chunk of uninitalized memory of this size" |
16:06 | <@McMartin> | If you want it to be zeroed by default, use calloc |
16:06 | <@ToxicFrog> | Token * new_token() { return malloc(sizeof(Token)); } /* create enough memory to hold a Token and return a pointer to it */ |
16:06 | < Moltare> | Oh, right, the whole 'manage your own memory' thing |
16:06 | <@McMartin> | Any malloc()ed memory will need to be manually free()ed when you're done with it. |
16:06 | <@McMartin> | And for God's sake, only ever free() it once, and don't access it after it's been free()ed. |
16:06 | < Moltare> | It doesn't need free()ing, sez lecturer |
16:07 | <@ToxicFrog> | In this case you probably don't need to worry about that, because it's a short-running program and the OS will free it all when it exits. |
16:07 | <@ToxicFrog> | Which is what the lecturer is saying. |
16:07 | <@ToxicFrog> | (sidenote: if you have "Token * foo", you access its internals with "foo->tag", rather than "Token foo" and "foo.tag") |
16:08 | <@ToxicFrog> | (and since you are now playing with memory management and pointers, you will indeed have Token * foo) |
16:10 | | Vornicus-Latens [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout] |
16:15 | < Moltare> | C doesn't natively do scientific notation, does it? |
16:16 | <@ToxicFrog> | Yes it does. |
16:16 | < Moltare> | Ah? handy |
16:16 | <@ToxicFrog> | double foo = 1.0e+06; /* compiles! */ |
16:16 | < Moltare> | I tried to look it up but found no references that weren't C++ or C# |
16:17 | <@McMartin> | Is atof() smart enough to read those? |
16:17 | <@ToxicFrog> | Yes. |
16:17 | <@ToxicFrog> | Well, strtod() is |
16:17 | <@ToxicFrog> | And the atof man page says the behaviour is "identical to strtod except it does not report errors" |
16:18 | <@McMartin> | Ah yes, another grand C tradition. |
16:18 | <@McMartin> | In other news, gets() is still required by all conforming runtimes. |
16:18 | | * ToxicFrog heads off to campus. Later! |
16:18 | | * McMartin goes to perform his ablutions. |
16:19 | < Moltare> | hooray for stuff! |
16:20 | < Moltare> | (and thanks for your patience) |
16:20 | <@McMartin> | (flex ends up being an actually useful tool in its own right, for all kinds of stuff) |
16:20 | <@McMartin> | (Granted, in nearly all of these cases you should for the love of God not be writing it in C) |
16:25 | | gnolam [lenin@Nightstar-10613.8.5.253.static.se.wasadata.net] has quit [Ping Timeout] |
16:26 | | gnolam [lenin@85.8.5.ns-20483] has joined #Code |
16:26 | | mode/#code [+o gnolam] by ChanServ |
16:30 | < Moltare> | 31 errors, woo |
16:30 | < Moltare> | Although 28 of them appear to be identical |
16:30 | <@McMartin> | Forgotten commas? |
16:32 | < Moltare> | No, it's complaining about enum token { T_BLAH... } and struct token { stuff here } |
16:33 | < Moltare> | Conflicting types, previous declarations, and something about not being able to return voids which I think might be a result of the first one. Also lots of whining about token which is clearly relating to the first issue |
16:33 | <@McMartin> | Ah, yes. |
16:34 | <@McMartin> | (I think you want typedef enum { T_BLAH... } TOKEN; and typedef struct token_struct { ... } token; ) |
16:37 | < Moltare> | yay, different errors |
16:37 | < Moltare> | I don't even understand what the first one is saying, this time ¬¬ |
16:37 | < Moltare> | Says, "In function `struct token * new_token()':" |
16:38 | < Moltare> | Or is that just "All of the following errors are here" or similar? |
16:39 | <@McMartin> | Yeah, that's "look here for what's going on" |
16:39 | < Moltare> | Then there's one "ANSI C++ forbids implicit conversion from `void *' in return", and a shitload of "request for member `type' in `tok', which is of non-aggregate type `token *'" and "return to `int' from `token *' lacks a cast |
16:39 | < Moltare> | " |
16:40 | <@McMartin> | OK, the shitload is of ou using .type instead of ->type |
16:41 | <@McMartin> | The ANSI C++ thing can DIAF and should only be a warning anyway, since you aren't *writing* C++... |
16:41 | <@McMartin> | If you want to get rid of it, make it return (token *)malloc (etc) |
16:42 | < Moltare> | ta |
16:42 | <@McMartin> | The "return to int from token *' lacks a cast" implies to me that your function declaration either forgot to declare a return type, or the people calling it don't know about its types |
16:42 | <@McMartin> | If the former, declare new_token as "token * new_token(void)" |
16:43 | <@McMartin> | If the latter, add the prototype "token *new_token(void);" - with the semicolon - to token.h |
16:43 | <@McMartin> | After the definition of the type |
16:43 | < Moltare> | (Is it token->value.intval or token->value->intval, idly?) |
16:44 | <@McMartin> | (token->value.intval, as value is not a pointer) |
16:44 | < Moltare> | (excellent, got something right) |
16:47 | < Moltare> | (with the result that I've only got the 'lacks a cast' ones left *fiddlefiddle*) |
16:47 | <@McMartin> | C assumes that any function it's never heard of returns an int and takes any number of untyped arguments |
16:47 | <@McMartin> | This is Always A Horrifically Bad Idea, so you need to type-declare them first in the header files. |
16:49 | < Moltare> | how random. Why int? |
16:49 | <@McMartin> | Because C's predecessor language only had two types; int and int*. |
16:50 | < Moltare> | Isn't that a bit... limiting? |
16:50 | <@McMartin> | With "int" defined as "whatever size you can shove into the hardware's register" |
16:50 | <@McMartin> | This would have been the late 60s/early 70s. |
16:50 | <@McMartin> | The idea that you could write "x+y*z" and have it work out operator precedence and assign temporary registers and stuff was still Hot Shit. |
16:51 | <@McMartin> | Though not *brand* new, the way it was for FORTRAN. |
16:51 | <@McMartin> | So called because it was a FORmula TRANSlator, and thus astonishing and new |
16:53 | < Moltare> | hm. Adding the prototype breaks the "forbids implicit conversion" thing again. And then removing it doesn't unbreak it. |
16:53 | < Moltare> | Za. |
16:53 | <@McMartin> | OK, the prototype should be there anyway |
16:54 | <@McMartin> | Where's the "implicit conversion" error? |
16:54 | <@McMartin> | And what's the line that produces it? |
16:54 | < Moltare> | I'd tell you, but it's gone again |
16:55 | <@McMartin> | Hmm. |
16:55 | < Moltare> | Also Dev-C++ is now telling me it's out of memory, and not letting me close it |
16:55 | <@McMartin> | whut |
16:56 | | * Moltare process-kills it, starts it up again, shrugs |
16:58 | < Moltare> | Right, now it's back to only doing the lacks-a-cast thing |
16:58 | <@McMartin> | What line produces it? |
16:58 | < Moltare> | Any line of the .l file that attempts to return a token |
16:59 | < Moltare> | "if" { token * tok = new_token(); tok->type = T_IF; return tok; } and its ilk |
16:59 | < Moltare> | (hence 26 of the original 31 errors being identical) |
16:59 | <@McMartin> | Aha |
16:59 | <@McMartin> | This sounds like yylex() isn't being prototyped. |
16:59 | <@McMartin> | Maybe add a token *yylex(void); to token.h too? |
16:59 | <@McMartin> | I'm stabbing in the dark here |
17:00 | <@McMartin> | If yylex() has a forced prototype, then you're kind of screwed |
17:01 | < Moltare> | 15 tokens.h |
17:01 | < Moltare> | ambiguates old declaration `struct token * yylex()' |
17:01 | < Moltare> | New and shiny extra error from that |
17:01 | <@McMartin> | Where was the old declaration? |
17:01 | < Moltare> | I never made the old declaration |
17:01 | < Moltare> | PResumably flex did it for me |
17:02 | <@McMartin> | Hm. Somewhere in lexer.l you've defined "struct token *" |
17:02 | <@McMartin> | Turn that to just "token *" if you did the typedef |
17:03 | < Moltare> | I've not defined "struct token *" anywhere |
17:03 | <@McMartin> | Hum |
17:03 | <@McMartin> | Can you paste lexer.l somewhere? |
17:04 | < Moltare> | Certainly |
17:05 | < Moltare> | Can't send the url |
17:05 | < Moltare> | oh, wate |
17:05 | < Moltare> | no voice :P |
17:05 | < Moltare> | (pm'd) |
17:13 | | You're now known as TheWatcher[afk] |
17:19 | | mode/#code [+v Moltare] by ChanServ |
17:29 | <+Moltare> | Hashtable next, then! |
17:29 | <+Moltare> | - stores variables |
17:29 | <+Moltare> | - has place(thing) and find(thing) |
17:30 | <+Moltare> | if find(thing) fails, uses place(thing) |
17:30 | <+Moltare> | - is basically just like I'd do it in Java? |
17:30 | <+Moltare> | oh, and |
17:30 | <+Moltare> | - goes in the parser .c file? |
17:40 | | * Moltare makes a shoddy first draft, leaves it for now |
17:41 | < C_tiger> | Mol: if you haven't got the regex: \/\*(.*?)\*\/ will work for you |
17:41 | <+Moltare> | I got it, but thank you all the same :) |
17:42 | < C_tiger> | Yeah, there was a little too much upscroll to read. |
17:43 | < C_tiger> | But that's literal /* (any character as many times as needed but MINIMAL number of times so the rest of the regex fits) literal */ |
17:43 | < C_tiger> | parentheses unnecessary. |
17:51 | | Vornotron [~vorn@Admin.Nightstar.Net] has joined #code |
17:57 | <@McMartin> | Also, for the record, from the PM discussion |
17:58 | <@McMartin> | If you're using flex and you want to return something that isn't an integer, you have to #define YYDECL to be the (semicolon-free) prototype for your parser function. |
17:59 | <@McMartin> | Otherwise you'll get type conflicts, which are The Lose |
18:39 | | Vornotron [~vorn@Admin.Nightstar.Net] has quit [Ping Timeout] |
18:46 | <@AnnoDomini> | Hm. Would anyone know where I could find an implementation of the Bresenham algorithm in assembly? Preferrably x86, but most anything will do. |
18:47 | | Vornotron [~vorn@Admin.Nightstar.Net] has joined #code |
18:59 | | You're now known as TheWatcher |
19:04 | | * AnnoDomini will try converting the pseudocode from Wikipedia, then. |
19:06 | < Vornotron> | What's the subject? |
19:06 | <@AnnoDomini> | Bresenham line algorithm in assembly. |
19:06 | < Vornotron> | Aha |
19:08 | <@AnnoDomini> | We're supposedly given a pseudocode for it in the materials for the class, but it looks to be the basic version, which won't help me. |
19:08 | <@AnnoDomini> | And it doesn't work, either. |
19:17 | <@McMartin> | I have C code for it in my first edition Graphics Gems book. |
19:18 | <@McMartin> | I believe I used it to create an assembler Bresenham's for the C64. |
19:18 | < Vornotron> | Bresenham is pretty easy, in the end. |
19:18 | <@McMartin> | Which is the wrong chip. =P |
19:18 | <@McMartin> | Yes. |
19:18 | <@McMartin> | SDL_gfx also has an implementation of it, I believe. |
20:02 | | Vornotron is now known as Finerty |
20:08 | <@MyCatVerbs> | McMartin: type conflicts in _C_ of all language are Double Lose, with Extra Lose on the Side. |
20:11 | | Attilla [~The.Attil@194.72.70.ns-11849] has quit [Quit: <Insert Humorous and/or serious exit message here>] |
20:14 | <@ToxicFrog> | Moltare: concerning the hash table: typically this would go in a seperate file, say, hash.c |
20:15 | <+Moltare> | Oh, which I then #include in parser.c? |
20:15 | <@ToxicFrog> | It gets a corresponding header, hash.h, which other files that use it #include (and contains function declarations and suchlike) |
20:15 | <@ToxicFrog> | No, you #include the hash.h |
20:15 | <@ToxicFrog> | The actual function _code_ goes in hash.c |
20:15 | <@ToxicFrog> | Which then gets combined with the rest of the program at link time. |
20:16 | <@ToxicFrog> | (also: it occurs to me that if you're using Dev-C++, the "ANSI C++ forbids..." warnings might be because it's trying to compile it as C++; double check your project settings) |
20:17 | | * gnolam ponders launching into one of his anti-Dev-C++ rants again. |
20:17 | | * GeekSoldier gets the popcorn. |
20:17 | <@ToxicFrog> | gnolam: hold it until after Mol is done with his homework, please? |
20:18 | <@ToxicFrog> | And for the record, I suggested just using gcc/MSYS directly, since this is a small project. |
20:18 | <@ToxicFrog> | Moltare: anyways. The idea is that a .c file holds actual code. A .h file holds struct, enum, and function declarations, #defines, and whatnot - everything that other .c files need in order to make use of that code. |
20:19 | <@ToxicFrog> | At build time, all the .c become .o (object code), and those are all combined into your executable or library. |
20:22 | <@gnolam> | McMartin: you wouldn't happen to have code for Bresenham ellipses in there somewhere as well? |
20:22 | <@McMartin> | (The key difference here is that C, unlike Java or Python, compiles each code file in a vaccuum) |
20:22 | <@McMartin> | gnolam: I have no idea; the book is buried somewhere in my closet |
20:27 | | Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has quit [Ping Timeout] |
20:27 | | Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has joined #code |
20:32 | | Attilla [~The.Attil@194.72.70.ns-11849] has joined #code |
20:33 | | Moltare [~moltare@Nightstar-29340.cable.ubr02.bath.blueyonder.co.uk] has quit [Ping Timeout] |
20:40 | | Moltare [~moltare@82.32.73.ns-25785] has joined #code |
20:42 | <@ToxicFrog> | Wibs. |
20:42 | <@AnnoDomini> | Question - what flag signifies the presence of a negative number in the x86 architecture? |
20:43 | <@AnnoDomini> | SF? |
20:43 | <@AnnoDomini> | Can't seem to find anything on x86 flags. |
20:45 | <@ToxicFrog> | Yes, it's SF |
20:46 | <@ToxicFrog> | Stands for "Sign Flag" |
21:52 | <@AnnoDomini> | Damn it. Conditional statements are such a pain in assembly. :/ |
21:54 | <@ToxicFrog> | It's just branches. |
21:54 | <@McMartin> | GOTO, M-Fer! DO YOU SPEAK IT! |
21:55 | | * AnnoDomini laughs. |
21:57 | <@gnolam> | They say JMP and you say "How high?". |
22:00 | <@AnnoDomini> | Ohgodaforloop. |
22:01 | <@ToxicFrog> | Many processors have an instruction specifically for doing for loops. |
22:01 | <@ToxicFrog> | Even in the ones that don't it's usually pretty straightforward. |
22:02 | <@AnnoDomini> | Won't work for me here, as I need the counter to increase, rather than decrease. It'll just be easier to make it myself. |
22:02 | <@ToxicFrog> | So it decomposes into INC, CMP, BRA |
22:15 | | Moltare [~moltare@82.32.73.ns-25785] has quit [Ping Timeout] |
22:35 | <@AnnoDomini> | Haaaate. |
22:35 | <@AnnoDomini> | Turns out some bastard put a nonworking implementation on Wikipedia. |
22:35 | <@AnnoDomini> | And nobody who tried to implement it has bothered to put a notice near the code. |
23:12 | <@Reiver> | ...what |
23:15 | <@AnnoDomini> | Ah, excellent. This implementation actually looks like it works. I do seem to have some bugs in this code, though, as every time I run it, it crashes DOSBox. |
23:23 | <@AnnoDomini> | What I don't like is that I don't understand the compiler directives in use here. It's what the lecturer used, but damn him, he didn't explain it very well. |
23:32 | | Finerty is now known as Vornicus |
23:46 | <@AnnoDomini> | Bug found - forgot RET at the end of subroutine. |
23:47 | <@AnnoDomini> | AWESOME. |
23:47 | <@AnnoDomini> | It actually works. |
23:53 | | GeekSoldier is now known as GeekSoldier|bed |
--- Log closed Mon Mar 10 00:00:14 2008 |