--- Log opened Sun Nov 12 00:00:28 2017 |
00:45 | | Vornlicious [Vorn@Nightstar-fbrltd.sub-70-211-132.myvzw.com] has quit [Connection closed] |
00:45 | | Vorntastic [Vorn@Nightstar-1l3nul.res.rr.com] has joined #code |
01:25 | | Kindamoody is now known as Kindamoody[zZz] |
01:32 | | celmin|away is now known as celticminstrel |
03:07 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving] |
03:07 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code |
04:18 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [Connection closed] |
04:27 | | RchrdB [RchrdB@Nightstar-qe9.aug.187.81.IP] has quit [Connection closed] |
05:01 | | Derakon is now known as Derakon[AFK] |
05:07 | | Vornlicious [Vorn@Nightstar-h0nrf1.sub-70-211-140.myvzw.com] has joined #code |
05:10 | | Vorntastic [Vorn@Nightstar-1l3nul.res.rr.com] has quit [Ping timeout: 121 seconds] |
05:22 | | Soare [cute@Nightstar-gvt3mb.ip-164-132-106.eu] has joined #code |
05:23 | | abilal [a@Nightstar-lgpmok.dfri.se] has quit [Ping timeout: 121 seconds] |
06:00 | | celticminstrel is now known as celmin|sleep |
07:44 | | Kindamoody[zZz] is now known as Kindamoody |
09:05 | | Vornicus [Vorn@Nightstar-1l3nul.res.rr.com] has quit [Ping timeout: 121 seconds] |
09:20 | | macdjord|dance is now known as macdjord |
10:35 | | macdjord is now known as macdjord|slep |
10:43 | < Vornlicious> | Whee more date conversions. This one: "12L17" is today. (It skips "I" as a month letter) |
10:49 | | Kindamoody [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Client exited] |
10:52 | | Kindamoody|autojoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code |
10:52 | | mode/#code [+o Kindamoody|autojoin] by ChanServ |
11:09 | | Kindamoody|autojoin is now known as Kindamoody|out |
11:19 | < Vornlicious> | Whee. Code(letter)-index({64;65;96;97},Match(code(letter), {64;73;96;105})) |
11:26 | <@gnolam> | 12L17? What kind of date format is that supposed to be? |
11:29 | < Vornlicious> | Some crazy bullshit that some beverage bottlers use in their manufacturing codes. Day, month (as a letter, a-m skipping I), two digit year |
11:30 | | gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has quit [Ping timeout: 121 seconds] |
11:31 | | Kindamoody|out [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds] |
11:31 | < Vornlicious> | And now he will never know |
11:36 | | gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has joined #code |
11:36 | | mode/#code [+o gnolam] by ChanServ |
11:37 | < Vornlicious> | Some crazy bullshit that some beverage bottlers use in their manufacturing codes. Day, month (as a letter, a-m skipping I), two digit year |
11:38 | <@gnolam> | (Power outage) |
11:39 | <@gnolam> | Ah. |
11:39 | <@gnolam> | justwhy.gif |
11:40 | < Vornlicious> | Also popular is 7315, ones digit of year and then day of year |
11:42 | <@gnolam> | ... what is wrong with your bottlers |
11:42 | | Kindamoody|autojoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code |
11:42 | | mode/#code [+o Kindamoody|autojoin] by ChanServ |
11:42 | < Vornlicious> | A startlingly large number of things. |
12:16 | | Soare [cute@Nightstar-gvt3mb.ip-164-132-106.eu] has quit [Ping timeout: 121 seconds] |
13:11 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code |
13:32 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code |
14:14 | | VirusJTG [VirusJTG@Nightstar-42s.jso.104.208.IP] has quit [Connection reset by peer] |
14:14 | | VirusJTG [VirusJTG@Nightstar-42s.jso.104.208.IP] has joined #code |
14:14 | | mode/#code [+ao VirusJTG VirusJTG] by ChanServ |
15:39 | | Jessikat` [Jessikat@Nightstar-r1fphs.dab.02.net] has joined #code |
15:39 | | celmin|sleep is now known as celticminstrel |
15:41 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [Ping timeout: 121 seconds] |
17:05 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code |
17:06 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [[NS] Quit: Leaving] |
17:06 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code |
17:12 | | macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code |
17:12 | | mode/#code [+o macdjord] by ChanServ |
17:12 | | macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds] |
17:44 | | Kindamoody|autojoin is now known as Kindamoody |
18:11 | | mac [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code |
18:11 | | mode/#code [+o mac] by ChanServ |
18:13 | | macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds] |
19:13 | | RchrdB [RchrdB@Nightstar-qe9.aug.187.81.IP] has joined #code |
19:25 | | macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code |
19:25 | | mode/#code [+o macdjord|slep] by ChanServ |
19:28 | | mac [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds] |
19:42 | | Kindamoody is now known as Kindamoody|afk |
19:51 | | KiMo|autorejoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code |
19:54 | | Kindamoody|afk [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds] |
20:01 | | Vornicus [Vorn@Nightstar-1l3nul.res.rr.com] has joined #code |
20:02 | | mode/#code [+qo Vornicus Vornicus] by ChanServ |
20:02 | | gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has quit [[NS] Quit: Computer maintenance] |
20:11 | | Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [Ping timeout: 121 seconds] |
20:15 | | himi [sjjf@Nightstar-v37cpe.internode.on.net] has quit [Ping timeout: 121 seconds] |
20:17 | | gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has joined #code |
20:17 | | mode/#code [+o gnolam] by ChanServ |
20:19 | | IRCFrEAK [GK-1WM-SU@Nightstar-820.c0d.45.5.IP] has joined #code |
20:20 | | IRCFrEAK [GK-1WM-SU@Nightstar-820.c0d.45.5.IP] has quit [RecvQ exceeded] |
20:23 | | IRCFrEAK [g_k_800k@Nightstar-ji9.phg.27.23.IP] has joined #code |
20:24 | | IRCFrEAK [g_k_800k@Nightstar-ji9.phg.27.23.IP] has quit [RecvQ exceeded] |
20:42 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving] |
20:42 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code |
21:14 | <&[R]> | https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fb e <-- TIL POSIX locale stuff is completely screwy |
21:16 | <&McMartin> | "Everything uses UTF-8 for "char" and what doesn't is broken and terrible anyway." |
21:16 | <&McMartin> | This is an important PSA: NEVER USE UTF-8 INTERNALLY, ONLY AT THE EDGES. |
21:16 | <&McMartin> | Use UCS-4 internally. |
21:17 | <&McMartin> | UTF-8 strings are not indexable. |
21:18 | <&[R]> | UCS-4 is what? |
21:19 | <&McMartin> | 32-bit integers, one per Unicode code point. |
21:20 | <&McMartin> | UTF-8 is a variable-length encoding, and one of the rather important string operations is "character at offset X" |
21:20 | <&McMartin> | You do not want that to be O(x). |
21:21 | <&McMartin> | You do not want your substring operations to contain half or a third of a code point in them. |
21:27 | <&McMartin> | UTF-8 is fantastic for exactly those cases where you can treat "string" as "opaque, immutable binary blob" |
21:34 | | KM|autorejoin [Kindamoody@Nightstar-k1m8bj.mobileonline.telia.com] has joined #code |
21:35 | | KM|autorejoin is now known as Kindamoody |
21:35 | | mode/#code [+o Kindamoody] by ChanServ |
21:37 | | KiMo|autorejoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds] |
21:40 | | Soare [mm@Nightstar-vg7om8.danwin1210.me] has joined #code |
21:56 | | Jessikat [Jessikat@Nightstar-mob28h.dab.02.net] has joined #code |
21:58 | | Jessikat` [Jessikat@Nightstar-r1fphs.dab.02.net] has quit [Ping timeout: 121 seconds] |
21:58 | | macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code |
21:58 | | mode/#code [+o macdjord] by ChanServ |
22:01 | | macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds] |
22:27 | | Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving] |
22:38 | | himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has joined #code |
22:38 | | mode/#code [+o himi] by ChanServ |
22:57 | < RchrdB> | FWIW, just because your data is in UCS-4 or UTF-32, that doesn't by itself mean the operation "index into it" has a meaning that you'd like or expect. Unicode codepoints don't correspond 1:1 with characters on screen. The existence of things like combining characters means that if you index into a sequence of UTF-32 codepoints at a random position, you may actually be indexing into the middle of a grapheme cluster. (A grapheme cluster is defined |
22:57 | < RchrdB> | as "thing that looks like a character on screen, and which the text editing cursor should usually treat as an atomic unit for the purposes of selection with the mouse and the left and right and backspace keys.") |
23:00 | < RchrdB> | You kind of can index into a UTF-8 string at a random point; the encoding is designed so that if you index into a UTF-8 string at a random byte, you can, without any ambiguity, get from the byte you're looking at, which may well be in the middle of a codepoint, to the start/end of the next/previous codepoint boundary by looking at only a small constant number of bytes to the right or left of the one you're currently looking at (I think 6 bytes). |
23:03 | < RchrdB> | There's a programming language called "Emily" made by Andi McClure which gets unicode really, really right. I think the way she did this is that text strings are UTF-8 bytes in memory and they have a bunch of different methods that return different kinds of iterators; one for iterating by byte, one for iterating by codepoint, one for iterating by grapheme cluster. |
23:03 | < RchrdB> | I have a vague notion that at least one other programming language did about the same thing but I can't remembr. |
23:03 | <&McMartin> | Yeah, two clarifications on that |
23:04 | < RchrdB> | ? |
23:04 | <&McMartin> | (a) I'm intentionally ignoring the issue of grapheme clusters/glyphs, because anything past "codepoints" is largely agreed to be something that machines should only have to deal with occasionally and at the human-interaction level |
23:05 | <&McMartin> | (b) Iterators are O(n) for access and that's part of what I'm considering bad |
23:05 | < RchrdB> | What meaningful processing can you implement with only indexing into codepoints? |
23:06 | <&McMartin> | substring, split |
23:07 | <&McMartin> | You *can* do it with bytes |
23:07 | <&McMartin> | But your life is harder unless you're doing one of the ones that UTF-8 was intentionally designed to make work just like it would for Latin-1 |
23:09 | <&McMartin> | Also, if you're working in C |
23:09 | <&McMartin> | Which you are, because that's what the link was about |
23:09 | < RchrdB> | Substring on codepoints is kind of a weird buggy operation anyway, since it can rip grapheme clusters in half by accident. |
23:10 | <&McMartin> | In some cases that's even correct behavior! |
23:10 | <&McMartin> | But yes, the usual issue you solve with this is that when you allocate space for a string of at least N length you don't have to measure twice. |
23:12 | | himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has quit [Ping timeout: 121 seconds] |
23:12 | < RchrdB> | I have a vague memory of hearing that one of the new fashionable compiled PLs like Go or Rust or something had a regex library where you ask it to do operations on unicode codepoints and it builds automata that implement the thing you're asking for but do it on the UTF-8 bytes instead of via a separate expensive decoding step. |
23:13 | <&McMartin> | Rust does that. |
23:13 | <&McMartin> | Rust strings are also, however, immutable blobs. |
23:14 | <&McMartin> | Rust also uses WTF-8 |
23:14 | <&McMartin> | Which covers a certain infelicity introduced by UTF-16. |
23:17 | < RchrdB> | are wtf-8 and cesu-8 the same thing? |
23:18 | <&McMartin> | No. WTF-8 can losslessly send strings that UTF-16 rejects there-and-back. |
23:18 | <&McMartin> | WTF-8 is specifically because the only things that do UTF-16 these days actually accept arbitrary sequences of 16-bit numbers. |
23:19 | <&McMartin> | (Because they're all systems that embraced Unicode Too Soon while being standardized.) |
23:19 | < RchrdB> | I knew what WTF-8 is but I had the wrong definition in my head for what CESU-8 was. |
23:19 | < RchrdB> | Yeah, unlucky that. |
23:21 | <&McMartin> | Actually looking at how it works, CESU-8 is Just Really Awful Across The Board. |
23:26 | | himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has joined #code |
23:26 | | mode/#code [+o himi] by ChanServ |
23:26 | < RchrdB> | I had to go look up the definition to check and yes it's pretty dumb. |
23:26 | | Derakon[AFK] is now known as Derakon |
23:26 | < RchrdB> | WtF-8 is the one which has a sensible reason for actually existing. >_> |
--- Log closed Mon Nov 13 00:00:29 2017 |