--- Log opened Sun Jan 20 00:00:52 2019 |
00:19 | <&McMartin> | All right. Time to see if I broke everything |
00:21 | | Derakon[AFK] is now known as Derakon |
02:26 | <&McMartin> | Heh, cute |
02:27 | <&McMartin> | ARMv4 has an Unsigned Multiply Long instruction UMULL destl, desth, op1, op2 - multiplies 2 32-bit numbers to produce one 64-bit result across two registers |
02:27 | <&McMartin> | op1 isn't allowed to be one of the dest operands (later chips make this allowed, and apparently no actual chips enforced this) |
02:27 | <&McMartin> | But op2 is, and the assembler I'm using will swap the order of the arguments if it can make that fit |
02:33 | | * McMartin fiddles with his ARM code, realizes he can shave two bytes off his x86 code. |
02:43 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code |
02:46 | | Kindamoody is now known as Kindamoody[zZz] |
02:49 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds] |
03:01 | | * McMartin then realizes he can shave three more bytes, and two memory operations, off the x86 code. |
03:01 | <&McMartin> | Now the ARM and x86 code are the same size. >_> |
03:18 | | Degi [Degi@Nightstar-72nf2v.dyn.telefonica.de] has quit [Connection reset by peer] |
03:24 | <&Reiver> | wut |
03:25 | <&McMartin> | ARM and x86 are different kinds of chips, built and designed in very different ways |
03:25 | <&McMartin> | So instructing them to do roughly the same thing will look, at the chip level, very different |
03:25 | <&McMartin> | However, for this rather messy function I am computing, the code for computing it is exactly the same size even though the two functions are organized completely differently. |
03:26 | <&McMartin> | They also both use the same number of registers, which is usually not something that happens, because ARM has lots of registers and uses that to make up for the fact that it has trouble with large hardcoded constants |
03:26 | <&McMartin> | But these all sort of balanced out here |
03:26 | <&McMartin> | The part I was improving was "I have two 64 bit numbers, and I want to multiply them together into a 128-bit number, but I only actually care about bits 32 through 63." |
03:27 | <&McMartin> | And I had a considerable amount of wasted work on the x86 code, including juggling some values around that I could instead simply refrain from trashing |
03:28 | <&[R]> | "ARM has lots of registers and uses that to make up for the fact that it has trouble with large hardcoded constants" <-- I'm kind of curious what x86 does to make that less of a problem. Wouldn't the resulting code still have to load the constant into a register to do something? |
03:29 | <&McMartin> | Nope! It feeds the constant from the instruction directly into the ALU. |
03:29 | <&McMartin> | It is completely legal to say IMUL EBX, 0x12345678 |
03:29 | <&McMartin> | Which multiplies EBX by that value. |
03:29 | <&McMartin> | The instruction is, admittedly, ten bytes long |
03:30 | <&McMartin> | Er |
03:30 | <&McMartin> | six bytes long. |
03:30 | <&McMartin> | ARM, meanwhile, you have to basically say... |
03:30 | <&McMartin> | ... well, OK |
03:31 | <&McMartin> | What you say is LDR r0, =&12345678; MUL r1, r0, r1 |
03:32 | <&McMartin> | But LDR r0, ={whatever} puts {whatever} in a read-only data table somewhere else, computes the distance of that table entry from the instruction in question, and emits LDR r0, [r15+nnnn] with the nnnn as computed. |
03:32 | <&McMartin> | (r15 is the program counter) |
03:32 | <&McMartin> | (You can do computed GOTO by doing math with it, or virtual dispatch or function returns by assigning variables/table entries to it) |
03:36 | <&McMartin> | 32-bit ARM (but not 16- or 64-bit) also has this incredibly wacky thing where you can arbitrarily bitshift one of the arguments on the way in to almost any instruction |
03:36 | <&McMartin> | So while the x86 code for what I was doing had a whole bunch of fancy multiprecision bitshift instructions in it, the ARM code did not... |
03:37 | <&McMartin> | ... but it lost no space or time on this because I could fold the many more operations needed to execute multiprecision bitshifts "by hand" into the rest of the computation as it went. <3 |
03:38 | <&McMartin> | (In each case that code ended up just about as tight as it could be made, with a little bit of slack on the x86 side that it needed in order to do bits of the later computation, so, no overall penalty paid.) |
03:39 | <&McMartin> | And even with that, in the end each architecture used exactly 4 32-bit registers to do all its work. |
03:42 | <&McMartin> | OTOH, a platform I'd *like* to have this routine for would be the Genesis, but its CPU, despite having 32-bit registers, not only lacks a "multiply two 32-bit numbers, get a 64-bit number" routine, like x86 has had since the 386 and ARM has had since the original Game Boy Advance... |
03:42 | <&McMartin> | ... it doesn't even have a "multiply two 32-bit numbers, get a 32-bit truncated number" routine ;_; |
03:42 | <&McMartin> | (The 68000's multiply instruction is 16x16->32 and it is the only one it has) |
03:44 | <&McMartin> | The routine in question is a PRNG, and while it's not cryptographically strong, it's better than every random number generator in libc. On x86 and ARMv4 it's also only 100 bytes long (92 bytes ROM, 8 bytes RAM). |
03:46 | <&McMartin> | I suppose the 68k way to do it would be to just hit RAM harder. |
06:46 | | Vorntastic [uid293981@Nightstar-6br85t.irccloud.com] has joined #code |
06:46 | | mode/#code [+qo Vorntastic Vorntastic] by ChanServ |
10:42 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code |
11:56 | | Kindamoody[zZz] is now known as Kindamoody |
12:18 | | Degi [Degi@Nightstar-qb8cbe.dyn.telefonica.de] has joined #code |
12:33 | | Kindamoody is now known as Kindamoody|afk |
12:35 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds] |
15:23 | <&[R]> | https://twitter.com/da_667/status/1086874402959097856 |
16:50 | | Kindamoody|afk is now known as Kindamoody |
17:59 | | Degi [Degi@Nightstar-qb8cbe.dyn.telefonica.de] has quit [Ping timeout: 121 seconds] |
18:14 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code |
18:25 | | * McMartin has a glorious and terrible vision |
18:41 | <&McMartin> | also wat |
18:41 | <&McMartin> | "the Raspberry Pi, a powerful "micro-computer" that is used for digital-making and coding" |
18:42 | | * McMartin gabefaces |
18:42 | <&McMartin> | I suppose this is technically correct |
18:42 | <&McMartin> | (the best kind of correct!) |
18:42 | <&[R]> | -*- McMartin has a glorious and terrible vision <-- a vision about Poettering dying, but everyone continues to use his shitware anyways? |
18:43 | <&McMartin> | No, this involved hard-coding working binary search trees as C89 literals |
18:46 | | Vorntastic [uid293981@Nightstar-6br85t.irccloud.com] has quit [[NS] Quit: Connection closed for inactivity] |
19:51 | | himi [sjjf@Nightstar-v37cpe.internode.on.net] has quit [Ping timeout: 121 seconds] |
21:33 | | Reiv [NSkiwiirc@Nightstar-ih0uis.global-gateway.net.nz] has joined #code |
21:33 | | mode/#code [+o Reiv] by ChanServ |
22:01 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds] |
22:02 | | himi [sjjf@Nightstar-1drtbs.anu.edu.au] has joined #code |
22:02 | | mode/#code [+o himi] by ChanServ |
22:10 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code |
22:10 | | mode/#code [+o Alek] by ChanServ |
22:40 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds] |
22:43 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code |
22:43 | | mode/#code [+o Alek] by ChanServ |
22:49 | | Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds] |
23:27 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds] |
23:31 | | Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code |
23:31 | | mode/#code [+o Alek] by ChanServ |
--- Log closed Mon Jan 21 00:00:53 2019 |