Into the Disintegration Chambers
Ep. 33

Into the Disintegration Chambers

Episode description

Whispering is an open-source dictation/transcription desktop app, Neo puts the “human” in humanoid robots, and OpenAI stuffs ChatGPT into a web browser.


🔗 Links

Download transcript (.vtt)
0:00

[JM]: I want to mention a new tool that I've been trying called Whispering.

0:03

[JM]: And my motivation with this tool is I spend a lot of time typing on a keyboard, and I like typing, and I like keyboards as we've talked about before, but a lot of typing does at some point get tiring.

0:18

[JM]: It has a physical effect on the body, at least for me.

0:24

[JM]: I want to spend more of my time, when possible, dictating, and less time typing out words.

0:31

[JM]: Sometimes, depending on what I'm doing, that's not really feasible.

0:34

[JM]: If I'm writing code, if I'm typing things that dictation software is not particularly likely to record very well,

0:43

[JM]: then typing is gonna be a much better way to do it.

0:46

[JM]: But composing email, text messages, the type of text I would often compose for a web form, like a GitHub issue, for example, these are kinds of things where dictation software can really do a lot.

0:59

[JM]: And it's really transcription, right?

1:01

[JM]: At the end of the day, it's taking my voice and it is transcribing it into text.

1:06

[JM]: And there are many tools that do this.

1:08

[JM]: One of the more popular ones for MacOS is called MacWhisper, which I believe is a application wrapper around an open-source transcription tool called Whisper, which is why you have MacWhisper and you have this tool that I'm talking about now called Whispering.

1:25

[JM]: And I've heard great things about MacWhisper.

1:27

[JM]: I think it's about 60 US dollars, give or take.

1:32

[JM]: But when I can, I prefer to reach for an open-source solution first and see if it can meet my needs.

1:38

[JM]: And sometimes it does, and that's great.

1:40

[JM]: And sometimes it doesn't.

1:41

[JM]: And I'm more than happy to pay someone to provide a tool that's going to be sufficiently improved over, say, the open-source alternative.

1:51

[JM]: In this case, so far, I'm really enjoying Whispering.

1:54

[JM]: The way it works is I have my cursor in a text field, whether that's an email composition window or in the Messages app where I'm composing a text message, and I hit command-shift-semicolon, which is I think the default key binding, and I will hear this audible sound...

2:12

[JM]: that means that it is listening.

2:14

[JM]: I'll then dictate what it is that I want to say.

2:16

[JM]: I push that same key combination again and then hear another sound.

2:20

[JM]: And then depending on how long I was talking, I will either very shortly see the transcription or I will have to wait for a while and then I will see the transcription.

2:30

[JM]: And it both puts the transcription on the clipboard as well as pastes it wherever the cursor is when you originally push the key combination.

2:39

[JM]: I feel like the transcription is very good.

2:43

[JM]: I'm using an open-source model, or an open-weights model, from Nvidia called Parakeet TDT, which provides a pretty good balance of resource consumption like RAM, transcription speed, and accuracy.

3:00

[JM]: I do wish the transcription were faster because there is lag for longer recordings and it does seem proportional to the recording length regardless of how much silence there was, because sometimes I'll say a sentence and then I'll think for a second.

3:16

[JM]: I should probably just go ahead and push the key combination and have it generate where I am so far.

3:21

[JM]: But it seems like if there's long periods of silence, that it takes just as long than if I were speaking that whole time.

3:28

[JM]: So if you are gonna talk for say 20 seconds and then push the button, you're gonna wait for a while.

3:35

[JM]: Don't expect it to just like pop up immediately.

3:37

[JM]: But if you say something relatively quickly, then you'll get a much quicker response.

3:41

[JM]: That's just been my experience so far.

3:43

[JM]: This Parakeet TDT model handles capitalization and punctuation automatically.

3:48

[JM]: So for me, it's a bit of an adjustment.

3:51

[JM]: If I say something something comma something something period, then I end up with the words "period" and "comma" in my transcription, which is not how it works with, for example, MacOS and iOS dictation.

4:08

[JM]: So if I frequently am going back and forth between these two different tools,

4:12

[JM]: then I have to remember which one I'm using, because on one of them I have to say those periods and commas. On the other one...

4:19

[JM]: I definitely shouldn't unless I want those words polluting my transcription.

4:24

[JM]: The development velocity of this project is great.

4:27

[JM]: I'm really excited to see how much work is going into it, how new features are being added at a rapid pace.

4:34

[JM]: The interface is definitely interesting.

4:37

[JM]: It's not particularly standard.

4:41

[JM]: So for example, I'm used to being able to hit command-comma to get to the settings and that does not work in this application.

4:49

[JM]: And I frequently have to look at this main user interface and stare at it and think, where are the settings, then I

4:57

[JM]: instinctively want to get to them fast so I hit command-comma and nothing happens, but I eventually see the little gear icon and okay there it is

5:05

[DJ]: You'd think for an app like this you could just say the word settings and it would open the settings menu like come on that just seems obvious

5:12

[JM]: So far it's just a transcription tool it doesn't allow you to use voice to control its own behavior but it does have like a always-listening mode and then like the key-command mode that I use where it I guess detects...

5:27

[JM]: when you're trying to dictate, I don't really know.

5:29

[JM]: I mean, I guess it just, if you're in a field and you start talking, it'll just start putting stuff in the field.

5:35

[JM]: I don't really know.

5:35

[JM]: I haven't tried that mode.

5:36

[JM]: The latest version that just came out is interesting because instead of an embedded database, there's a shift now to storing all of the data, the recordings, the transcriptions,

5:50

[JM]: and the transformations, which I'll get to in a moment, as regular Markdown files with YAML front-matter.

5:57

[JM]: And the idea is that instead of using IndexedDB, which was the tool that was used initially when the idea was that Whispering was gonna be a web-first tool, but now that the focus is more on a desktop app,

6:12

[JM]: it doesn't really make sense to be using this browser-based database to store this data.

6:17

[JM]: And so the author recognized this is now the wrong tool for the job.

6:21

[JM]: The data is opaque.

6:23

[JM]: Making backups requires special tooling. Import/

6:27

[JM]: export is complicated.

6:29

[JM]: You don't really own your data.

6:30

[JM]: That's the main thing that the author was focused on and wanted to get it out of this database and into plain text files that you can easily read, edit, backup, and move around freely.

6:41

[JM]: So I think this is a really cool change and sets the stage for easier syncing between devices so that you could sync your recordings, transcriptions, and transformations from one computer to another,

6:54

[JM]: makes it easier to backup your data, restore that data from backups and a bunch of other things.

7:00

[JM]: I mentioned transformations.

7:02

[JM]: I haven't used this yet, but it seems interesting because the idea is that you can apply a set of transformations to your transcription.

7:10

[JM]: So if for example,

7:12

[JM]: you are a non-native speaker of English, maybe this allows you to automatically give it some kind of LLM-based command, like review the transcription for grammatical errors and fix them.

7:26

[JM]: Again, I haven't used the transformation feature.

7:28

[JM]: So I don't really understand what it does and how it works and what those use cases are.

7:33

[JM]: But it does sound interesting that you can do that. I haven't found a particularly notable use case for it yet, but I'm sure they exist.

7:41

[JM]: If you visit the GitHub repository for this tool, the first thing you'll notice is the location of the app and its README is not in the project root where you would expect it.

7:50

[JM]: It's somewhat buried inside other folders.

7:54

[JM]: And the reason is that Whispering is part of a larger envisioned project called Epicenter.

8:00

[JM]: With the idea being that there's going to be, or I guess there already is, this Epicenter Assistant, which is this local-first assistant that you can chat with.

8:09

[JM]: And the idea is that it will become the access point to everything you've ever written, thought, or built.

8:15

[JM]: And on the web site, it says, "A database for your mind built on plain text, one shared context across all your apps."

8:22

[JM]: "Your transcripts inform your notes."

8:24

[JM]: "Your notes guide your AI."

8:26

[JM]: "Everything connects through a single folder of plain text."

8:29

[JM]: And the README says that this Assistant is currently unstable and is waiting for a pull request in another project that I think it depends upon to be merged.

8:39

[JM]: And I think this is interesting from the context of, here's an agent that is running locally, that isn't making external API calls unless you decide that's how you want to do it, that can look at, say, all of your Markdown-based notes

8:58

[JM]: and other projects that you work on and can be queried in interesting ways.

9:04

[JM]: I'm not really sure exactly how I would use this Epicenter Assistant if and when it becomes more stable, but it does seem interesting that Whispering is a part of it.

9:16

[JM]: And I look forward to seeing what it might eventually be able to do.

9:18

[DJ]: I'm interested in checking this one out because I'm a big fan of so-called local-first software, where the ethos behind the design of a piece of software is that it's based around using the resources of your computer, things like your file system, etc., instead of what

9:38

[DJ]: often feels like this sort of automatic and perhaps even inevitable way that software is being built where all the data is on a server somewhere and you probably access it through your web browser.

9:53

[DJ]: So it always makes me happy to come across projects like this that offer an alternative to a popular category of apps because this idea of like put all your data somewhere and we'll help you organize it

10:08

[DJ]: has been around for a while and it's very popular it reminds me of these so-called like second brain apps that got popular over the last five years apps, like Notion is a popular example, and then a more recent addition to that idea of "put all your data somewhere and we'll help you organize it" of course is,

10:28

[DJ]: "Put all your data somewhere and we'll use an LLM to help you organize it."

10:33

[DJ]: But again, I don't really want to use an application like that if it requires all my data to be possessed by some company and stored in a data center somewhere.

10:46

[DJ]: That's not how I'd prefer to roll.

10:48

[DJ]: So I'm really excited to see apps like this and the rise of tools for using large language models locally on our own hardware.

10:59

[DJ]: And of course, those models becoming more and more able to run on local hardware because it means projects like this that can give us a local-first alternative.

11:08

[DJ]: I think that's really exciting.

11:11

[JM]: Okay, moving on.

11:12

[JM]: In other news, Apple has released the 26.1 series of operating systems.

11:20

[JM]: So that includes the iPhone, MacOS, iPadOS, all of the various OSs.

11:27

[JM]: And I don't think either of us have a whole lot to say about 26.1.

11:31

[JM]: I'm not running any of the 26 OSs.

11:36

[JM]: The only reason that I thought this was notable really comes down to two things.

11:42

[JM]: One, it sounds like

11:43

[JM]: Apple has heard all of the many complaints about their Liquid Glass design language and has decided to give folks the option to tone it down a bit by adding some opacity, somewhat of a

11:58

[JM]: frosted effect on the Liquid Glass and giving folks perhaps a less-glass option to make certain visuals a little bit easier to read, providing more contrast, providing perhaps less instances in which some floating control that is transparent makes that control unreadable because it is floating above some other thing and making the effects mostly illegible.

12:26

[JM]: The other thing that I find interesting potentially about this is, as we have talked about many times, I do not turn on automatic updates.

12:35

[JM]: But I know folks who do have automatic updates turned on in part because they won't go and

12:42

[JM]: do it otherwise.

12:45

[JM]: And so I wonder now that Apple has released this point release whether they will turn on — or maybe they have already turned on — automatic updates to this version because there's always like this grace period where Apple will automatically update you

13:03

[JM]: to essentially last year's version until they feel like the current one is baked enough to foist it on you.

13:12

[JM]: And at least up until yesterday, that had not been the case.

13:16

[JM]: If you had automatic updates turned on, you would get automatic updates to 18 point whatever.

13:22

[JM]: with this big banner telling you, hey, if you want to, you can manually update to the 26 version.

13:29

[JM]: But at some point, Apple is going to flip this switch and people are going to wake up the next morning and see this very significant change to their everyday environment.

13:39

[JM]: And I don't know, maybe...

13:41

[JM]: folks will be like, "Oh, this is so cool."

13:43

[JM]: I don't know.

13:44

[JM]: But I know that loved ones that I know who have automatic updates turned on, I will be sending them all a message today that says, "Hey, you might want to turn it off for like, I don't know, a month or two to give Apple enough time to work out perhaps even more kinks, visual or otherwise, to the 26 line of operating systems."

14:07

[JM]: Whether they choose to do that or not, is obviously up to them.

14:09

[JM]: But I feel like it is my duty to at least let them know that this will happen unless they take action at some point.

14:18

[JM]: Moving on to one of my favorite news stories of late is the introduction of the Neo humanoid Robot, a $20,000 humanoid robot that will do your household chores with some fine print involved.

14:37

[DJ]: What is the fine print involved?

14:39

[DJ]: Like, also it watches you sleep?

14:41

[DJ]: Yeah.

14:42

[DJ]: May turn on humanity.

14:45

[DJ]: It might end up having a switch that's set to evil, like in that old Simpsons Halloween episode with the Krusty doll that tries to kill Homer.

14:54

[DJ]: I assume all of that is in the fine print.

14:57

[JM]: No doubt.

14:57

[JM]: Well, for one thing, it's not just $20,000.

15:00

[JM]: It's $20,000 and it's $500 per month.

15:05

[JM]: And there is a six month commitment.

15:08

[JM]: So you're already looking at a minimum of $23,000 over the first six months.

15:16

[JM]: That's your minimum investment to try this thing.

15:18

[DJ]: Sorry.

15:19

[DJ]: Oh no, Justin, I think I've entered a fugue state, and I suddenly woke up and I was negotiating for the purchase of a car.

15:24

[DJ]: What happened?

15:27

[JM]: My favorite part about this is that this so-called robot is not fully autonomous.

15:34

[JM]: There are certain tasks that perhaps it doesn't quite have the fine motor skills to accomplish.

15:42

[JM]: And when it is presented with say one of these tasks, a human will take over and the robot will be controlled remotely by a human wearing a VR headset

15:55

[DJ]: Well, okay, hold on.

15:59

[DJ]: Hold on.

15:59

[DJ]: Sorry.

16:00

[DJ]: A human wearing a VR headset is funny enough, but you said that there are tasks for which it lacks the fine motor skills.

16:06

[DJ]: And when it can't do that, a human will take over.

16:09

[DJ]: But if it lacks the fine motor skills to do a task, that means they actually have to send a person to your house, right?

16:16

[DJ]: Like to do the task instead.

16:18

[DJ]: Because I don't see how a remote operator with a VR headset makes up for a lack of fine motor control.

16:24

[JM]: I don't fully understand how this mode works.

16:28

[JM]: Like what things can it do autonomously?

16:30

[JM]: Like, is there a human whose job it is, 24 hours a day to monitor each and every one of these, like in real time and essentially babysit it and then be like, "Oh wait, it's having trouble with this thing."

16:42

[JM]: And then it just like

16:44

[JM]: flips the manual override switch and then suddenly takes control over it.

16:49

[JM]: I don't really understand how this switch occurs, but the fact that it occurs at all to me is just hilarious.

16:59

[DJ]: I have so many questions.

17:00

[DJ]: So do they have a warehouse full of people sitting next to each other in chairs wearing VR helmets like in some horrible Japanese anime dystopia?

17:09

[DJ]: Or is it like a gig economy thing where it's like, operate this robot from the comfort of your own home sometimes when it can't put a plate away?

17:17

[JM]: I'm fairly confident that is how this is designed to work, yes.

17:22

[JM]: This thing is controlled remotely by humans wearing VR headsets looking at you in your pajamas through the robot's cameras while it tries and mostly fails to load the dishwasher.

17:34

[JM]: And I saw someone, maybe the founder of this company, but someone who otherwise represents the company, comment on this saying, "For the product to be useful, you have to be okay with this."

17:47

[JM]: And "this" being the fact that a human is looking through the robot's cameras,

17:53

[JM]: at your home, the inside of your home.

17:57

[JM]: So, you know, I feel like the tagline for this thing should be: "$20,000, $500 per month, and the privacy of your home are a small price to pay for a digital peeping-Tom who can almost do your chores."

18:11

[DJ]: Oh man, I think that marketing copy could fit in a tweet as well, so we're basically set.

18:16

[DJ]: This is such a stupid idea, and it's amazing and stupid, and it also feels inevitable and stupid.

18:27

[DJ]: It's such a perfect combination of everything I feel like we complain about with regards to the world of technology that we live in.

18:37

[JM]: The tech *business*, right?

18:40

[DJ]: Yeah, exactly.

18:41

[DJ]: The tech business.

18:42

[DJ]: And the other thing is that someone would make that statement, and there's such a thing as telling on yourself, right?

18:48

[DJ]: Whatever the CEO or whoever thinks they're accomplishing with that statement, you can turn it around, because it's like, now, in order for this to be useful, you have to be okay with random people spying on you through the creepy eyes of a humanoid robot.

19:02

[DJ]: Like, okay, the inverse of that is, since many of us are not okay with that, therefore, your product is not useful.

19:11

[DJ]: So, good luck with your Series A round, I guess.

19:16

[JM]: Yeah, because to me, for a robot to be useful, it should be a robot, which this is arguably not.

19:25

[JM]: If a human has to remotely control it, it's not really a robot.

19:28

[JM]: It is a remote-controlled machine.

19:33

[DJ]: Yes.

19:33

[DJ]: And it's also like so many of the creepiest parts of, again, the tech business in the 21st century is that it's like a super awkward recreation of almost the social mores of a different era.

19:46

[DJ]: Because people used to hire people to live in their houses and unload their dishwasher.

19:53

[DJ]: Well, they didn't have a dishwasher, but you take my point, right?

19:55

[DJ]: Like people used to have servants, right?

19:57

[DJ]: And so the promise ever since, like the cartoon The Jetsons, I guess, was that someday, you know, there are lots of problems with having other people as servants.

20:07

[DJ]: We don't have to get into those here.

20:09

[DJ]: I think that's well covered elsewhere for the last hundred years.

20:12

[DJ]: But wouldn't it be nice if you didn't have to do your chores?

20:16

[DJ]: Therefore, robots.

20:17

[DJ]: That's been a longtime science fiction fantasy.

20:20

[DJ]: So now we're trying to make it a reality.

20:22

[DJ]: But how do we make it a reality?

20:24

[DJ]: Well, the only way we can do that right now is to have this simulacrum that's actually backed by a human being.

20:32

[DJ]: So again, you're essentially paying twenty grand plus $500 a month, plus taxes and fees probably...

20:38

[DJ]: to have a servant. It's just that the servant is presumably on a couch somewhere with a VR helmet,

20:44

[DJ]: so that they don't even have to be in your house, which is like the most Millennial thing I can imagine, right?

20:50

[DJ]: Which is that like, of course you want a servant, but the worst thing you can imagine is having to interact with another human being.

20:56

[DJ]: So no, we'll just pop this creepy-looking felt and metal mannequin in there.

21:01

[DJ]: So the person can be somewhere else while they try and fail to put away your dishes and see you in your pajamas.

21:07

[DJ]: They can still see you in your pajamas, Justin.

21:09

[DJ]: And that's the real problem.

21:13

[DJ]: I mean, frankly, if I bought one of these things, I'd have to seriously up my pajama game and I'm just going to leave it at that.

21:19

[JM]: It's so emblematic of the whole "Fake it till you make it" tech phenomenon, right?

21:26

[JM]: Like if you can't make it work, then just fake it.

21:29

[JM]: I wish this were the first time that we've seen a report of some thing that's amazing, some automated thing that we find out later on...

21:39

[JM]: is not actually automated, but a human is doing it manually behind the scenes.

21:44

[JM]: We've seen this happen.

21:45

[JM]: Well, I suppose at least in this particular case, it's up-front, right?

21:48

[JM]: They're not saying, "Hey, here's this robot that's operating autonomously."

21:54

[JM]: Like they're at least saying from the get-go, clearly this thing is at least in part controlled remotely by a human.

22:03

[JM]: And I feel like recently, before this announcement, I saw someone on the Internet saying,

22:08

[JM]: who's a programmer who essentially said something along the lines of, "I don't want AI to write my code, generate my books, music, movies, or other cultural content."

22:20

[JM]: "I just want it to do the menial tasks and chores that I don't want to do."

22:25

[JM]: And ask and you shall receive, I guess.

22:28

[JM]: Like, it seems like not too long after I saw this, this product was announced.

22:32

[JM]: And of course,

22:33

[JM]: I just love the footnote.

22:35

[JM]: Yeah, well, guess what?

22:37

[JM]: It's still a human doing it anyway.

22:38

[DJ]: Yeah, well, I've seen similar complaints where it was like the promise of automation was supposed to be take care of the menial, busy work of human life so that we can spend our time being creative.

22:49

[DJ]: And instead, what has been delivered is the exact complete opposite of that.

22:53

[DJ]: Where now AI generates music and books and code and everyone who would like to do those things themselves is like, "So do I still have a purpose in the world, or is it time to head into the disintegration chambers?"

23:07

[DJ]: And meanwhile, to put your dishes away, you still need a person, presumably not a very well-paid person, by the way.

23:14

[DJ]: How much do you think the people in the VR headsets are getting paid to at least sometimes operate these creepy robots in the houses of people that I presume probably make ten times as much as they do?

23:24

[DJ]: Yeah, we must continue to be frustrated and perhaps outraged at the world that has been delivered to us and try to build a better world that does not include these weird felt-faced robots.

23:36

[JM]: Seems like a good banner to carry.

23:40

[JM]: I can just picture this team, right, developing this robot and thinking, "Man, we're so close."

23:45

[JM]: "If only we could get this robot to do something useful."

23:50

[JM]: "But we can't."

23:51

[JM]: "So until we can, we'll just get a human to control it remotely."

23:57

[JM]: "And then eventually..."

23:58

[JM]: "we'll be able to figure out how to do it programmatically."

24:02

[JM]: "Eventually, we'll make it so that this robot delivers its stated mission and operates autonomously."

24:09

[JM]: "But until then, we can at least ship something."

24:11

[JM]: That seems like what's actually happening here.

24:14

[JM]: And there's just something about it that is just so both sad, amusing...

24:19

[JM]: I don't know how else to describe it.

24:20

[JM]: There's probably other adjectives that I'm missing here, but I don't know.

24:23

[JM]: The whole thing is just, I think it's mostly amusing.

24:26

[DJ]: Ridiculous.

24:27

[DJ]: I think "ridiculous" is also a good one.

24:29

[DJ]: Yeah.

24:29

[DJ]: And the thing about that, what you just described is it's totally the minimum viable product ethos, but that works a lot better when you're talking about like a web app

24:41

[DJ]: where certain features don't quite work yet, or they're like implemented, but they're implemented in a really clunky way.

24:48

[DJ]: And once the company gets a thousand customers and starts generating some revenue, they can re-write their backend or something like that, right?

24:55

[DJ]: Like software is super malleable in that way.

24:59

[DJ]: That model feels, I think, again, like it feels a lot more ridiculous when it's applied to so-called, like, hard, like

25:07

[DJ]: physical-world problems like this.

25:09

[DJ]: Whereas it's like, we want these autonomous robots, but they're not autonomous.

25:13

[DJ]: So they're just going to be remote-control robots for now until we can make them autonomous.

25:19

[DJ]: But it's like, well, for starters, do you see any clear path to actually accomplishing that?

25:24

[DJ]: Or are you just hoping that someone will hand you another quarter of a billion dollars so you can keep picking away at it, I guess?

25:33

[JM]: Okay, moving on.

25:34

[JM]: In a recent announcement, OpenAI said, "Today we're introducing ChatGPT Atlas, a new web browser built with ChatGPT at its core."

25:44

[JM]: "With Atlas, ChatGPT can come with you anywhere across the web, helping you in the window right where you are, understanding what you're trying to do and completing tasks for you all without copying and pasting or leaving the page."

25:57

[JM]: "Your ChatGPT memory is built-in, so conversations can draw on past chats and details to help you get new things done."

26:05

[JM]: "ChatGPT Atlas is launching worldwide on MacOS today to Free, Plus, Pro, and Go users."

26:13

[JM]: "Experiences for Windows, iOS, and Android are coming soon."

26:18

[JM]: My first thought seeing this was, "Experiences?"

26:21

[JM]: Gross.

26:22

[JM]: Why not say "Atlas for Windows, iOS, and Android are coming soon"?

26:27

[JM]: What do you mean, "experiences"?

26:29

[JM]: What is with this marketing language?

26:30

[JM]: I don't understand it.

26:31

[JM]: Also, I think it's interesting that Atlas is out on MacOS before any other platform.

26:38

[JM]: Remember our conversation from last week about how MacOS used to be the nerdy kid at school that was different and weird and got picked on?

26:46

[JM]: Well...

26:47

[JM]: Look how much has changed since those days.

26:49

[JM]: Now all the other platforms are out in the cold, and MacOS is taking center stage for this announcement.

26:55

[JM]: Anyway, Atlas is a Chromium-based browser that can't be used at all without signing into ChatGPT.

27:01

[JM]: And Atlas is by no means unique.

27:04

[JM]: There are a increasing number of browsers being shipped these days with large language model capabilities built into them.

27:11

[JM]: Apparently Atlas has an "agent mode" somewhere that's not super easy to find.

27:18

[JM]: And I haven't heard anyone really talk about this agent mode very much, particularly in a code generation and testing context, which I think is perhaps the only use case that I personally have any interest in.

27:32

[JM]: So at this point, I don't really have any motivation to even install and try this product.

27:38

[JM]: If it did, for example,

27:41

[JM]: have the ability to generate, run, and test code, all in a browser context in a super easy way, I could see that being useful.

27:49

[JM]: But outside of that, I personally don't see many interesting use cases for this class of product.

27:56

[JM]: Looking at the announcement, it seems like even the folks at OpenAI struggle to find compelling use cases.

28:02

[JM]: To me, it feels like Chromium with a chat button bolted onto it.

28:07

[JM]: I don't understand at this stage why someone would prefer this to a combination of a standard web browser and, say, the native ChatGPT Mac application. Combining them into one thing? That...

28:21

[JM]: doesn't seem to provide much utility.

28:24

[JM]: Before I dig into some of the problems that I have with this class of product... Dan, do you see lots of interesting use cases here that I'm missing?

28:33

[DJ]: I don't, but I can't say that that's because I couldn't see them.

28:39

[DJ]: It's more that I haven't looked for them.

28:41

[DJ]: It's very clear to me that this type of product is not for me.

28:45

[DJ]: I assume it's targeted at...

28:47

[DJ]: the average human as opposed to a technology enthusiast, because ChatGPT itself is like massively popular with the mainstream as I understand it, right?

28:59

[DJ]: I think they said they have like 700 billion accounts or something like that.

29:03

[DJ]: There's like 20 trillion humans that use ChatGPT every day or something.

29:07

[DJ]: Those sound like totally real numbers.

29:09

[DJ]: Yes.

29:10

[DJ]: Yeah.

29:10

[DJ]: I mean, no, those numbers are commensurate when you look at like how much money OpenAI is like promising to spend versus the amount of revenue they make.

29:20

[DJ]: Yeah.

29:20

[DJ]: I mean, it all just kind of adds up anyway.

29:22

[DJ]: I guess you could call me a bit of an OpenAI-skeptic at this point, both for their financial future,

29:28

[DJ]: and also just, as I said before, with local-first software, um,

29:32

[DJ]: I'm just reaching a point where every new window I open that says enter your username and password or create an account starts to feel like an amount of friction I don't want to take on.

29:43

[DJ]: I start asking, "All right, do I really need this actually?"

29:46

[DJ]: I actually have several web browsers already on my computer that let me browse the web without needing to sign into an account.

29:54

[DJ]: I don't get why I would want this.

29:57

[DJ]: ChatGPT bolted onto my web browser.

29:59

[DJ]: Like I use ChatGPT.

30:01

[DJ]: It's very useful actually.

30:02

[DJ]: And in a way, querying a large language model instead of doing a Google search and then trolling through web pages is a more effective technique I've found for like finding certain classes of information or solving certain classes of problem.

30:19

[DJ]: Especially in a professional context.

30:22

[DJ]: Like I ran into some weird bug and my app isn't building and I don't understand what this error means.

30:28

[DJ]: Well, I used to paste an error message into a search engine and then click on the first six Stack Overflow posts and hope that the information in there would give me the solution, which can often be effective but is not that easy.

30:44

[DJ]: A lot of the time, ChatGPT provides an experience that is at least no worse and often better than that.

30:51

[DJ]: But that being the case, I don't see why I need to combine the chatbot with the web browser.

30:56

[DJ]: But so I'm going to have to leave it up to other people to explain why this is awesome.

31:00

[DJ]: The fact is, I'm probably never going to try it.

31:03

[JM]: Okay, so it sounds like you don't really see lots of other use cases that are missing here.

31:06

[JM]: And setting aside the question of whether LLM-and-browser mashups provide sufficient utility, I see three primary problems with the entire concept of LLM-infused browsers, regardless of which company is shipping them.

31:23

[JM]: The first one is that this feels distinctly anti-web to me because these things are browsers and yet they are not browsers.

31:34

[JM]: They are more anti-web browsers than anything else because they don't really behave much like browsers.

31:41

[JM]: You ask questions to the chat bot and it gives you answers but it doesn't feel like actually using a web browser.

31:48

[JM]: It feels designed to supplant web browsers

31:52

[JM]: more than making a web browser more useful.

31:56

[JM]: And large language models in general feel like they are designed to supplant web browsers.

32:03

[JM]: So I just don't really feel in general that this is something that's good for the web.

32:07

[JM]: My next concern with this class of product is security.

32:11

[JM]: Because if you think of two things that have been in the news a lot lately, it's AI-powered browsers and NPM supply chain attacks.

32:20

[JM]: NPM is a JavaScript package manager that frequently is used to provide software that runs in web browsers.

32:29

[JM]: And the intersection of large language model-powered browsers and JavaScript supply chain attacks feels like it's not going to end very well.

32:40

[JM]: Jim Nielsen wrote an article on the web, and I'm going to read a portion of it because it explains this much better than I could.

32:45

[JM]: "Imagine for a second something like the following.

32:48

[JM]: You're an attacker and you stick malicious instructions, not code mind you, just plain-text English language prose in your otherwise helpful library and let people install it.

32:59

[JM]: No malicious code is run on the installing computer.

33:02

[JM]: Bundlers then combine third-party dependencies with first-party code in order to spit out application code, which gets shipped to end users.

33:11

[JM]: At this point, there's still zero malicious code that has executed on anyone's computer.

33:16

[JM]: Then end users with AI browsers end up consuming these plain-text instructions that are part of your application bundle.

33:24

[JM]: And boom, you've been exploited.

33:26

[JM]: At no point was any malicious code written by a bad actor executed by the browser engine itself.

33:31

[JM]: Rather, it's the bolted-on AI agent running alongside the browser engine that ingests these instructions and does something it obviously shouldn't.

33:41

[JM]: In other words, it doesn't have to be code to be an exploit.

33:45

[JM]: Plain-text human language is now a weaponizable exploit, which means the surface for attacks just got way bigger."

33:52

[JM]: And I think Jim has really succinctly summarized my concerns about the security risks

33:58

[JM]: of this entire class of LLM-infused browsers.

34:03

[JM]: My third concern with this whole class of product is privacy.

34:07

[JM]: In their announcement, OpenAI says the following, "By default, we don't use the content you browse to train our models.

34:15

[JM]: If you choose to opt-in this content, you can enable include web browsing in your data control setting.

34:22

[JM]: Note, even if you opt into training, web pages that opt-out of GPTBot will not be trained on.

34:30

[JM]: If you've enabled training for chats in your ChatGPT account, training will also be enabled for chats in Atlas.

34:36

[JM]: This includes web site content you've attached when using the Ask ChatGPT sidebar and browser memories that inform your chats."

34:46

[DJ]: I've noticed that when it comes to LLM companies and privacy, the thing a lot of people seem to have gotten in their heads is I don't want them to train their models on something, my information, my whatever.

35:01

[DJ]: I actually think there's a simpler concern even than that, which is merely, I just don't want one more gigantic tech company to have any insight whatsoever into what I'm doing on the web.

35:18

[DJ]: I'm just not interested in sending a bunch more information about myself to one more of these companies, because I have no reason to trust their motives or behavior.

35:30

[DJ]: So, so I agree with you about like all of the problems that you've just said, but especially with this privacy thing.

35:36

[DJ]: Yeah, for me, it doesn't even have to do with, are they going to train a large language model?

35:40

[DJ]: Like, I don't think I feel as strongly about that as a lot of people do.

35:44

[DJ]: For me, it really is much more just like OpenAI saying like,

35:47

[DJ]: "Hey, we want to just have like an open network connection on your computer more often so that we can, you know, and more of the data, more of the things you do on your computer are going to travel down that connection to us."

36:02

[DJ]: "Sound good?"

36:03

[DJ]: And my response is "No, it doesn't."

36:06

[JM]: My biggest concern with this is that people will opt-in to this feature.

36:11

[JM]: They will say, I want to include web browsing because of some perceived benefit that this will provide them greater capabilities in an Atlas browsing context.

36:24

[JM]: And I think one of the primary motivations for OpenAI to ship Atlas is not to so much ship a browser, but instead to ship something that will circumvent scraping-blocking attempts.

36:40

[JM]: To me, it feels more like a distributed human-based scraper than anything else, because large language model scraping is reviled enough that bots from companies like OpenAI, and the IP ranges of companies like OpenAI, often end up in block lists.

37:01

[JM]: And plenty of servers outright terminate any connections incoming from these perceived bot scrapers.

37:09

[JM]: But what if you didn't need to play a game of cat-and-mouse and could instead convince people to enable this, and then you can use regular residential, civilian IP addresses with organic-looking traffic patterns

37:23

[JM]: that can use humans to bypass CAPTCHAs, provide proof-of-work challenges... all of this for free.

37:32

[JM]: You don't even have to pay people to do it.

37:33

[JM]: People will just volunteer and enable these opt-in LLM browser features.

37:39

[JM]: And think of the people who are accessing, like, closed forums that are closed for good reasons.

37:45

[JM]: So to me, more than anything, it feels like a Trojan horse where this is going to enable people to access knowledge and information that the scraper bots can't access.

37:58

[JM]: And it also potentially means that online safe spaces can be at risk.

38:05

[JM]: This idea that they say, "Okay, well, web pages can opt out of the GPTBot and we won't train on that stuff."

38:12

[JM]: I don't find this particularly satisfying.

38:15

[DJ]: I don't see why we should take a company like this at their word.

38:18

[JM]: And I think of all of the places that either don't even know that they can do that, or don't know *how* to do that.

38:26

[JM]: So to me, there's this disconnect where the person using the Atlas browser, they're the ones opting into the training and then ingesting data from people who *didn't* opt into it.

38:37

[JM]: They have to opt *out* of it.

38:39

[JM]: That to me feels like the most sinister, unstated motivation behind this class of product.

38:45

[JM]: As for what you can do, you can just use a regular browser and not use Atlas.

38:51

[DJ]: Yeah, you can not use Atlas.

38:54

[DJ]: Let us introduce you to other browsers you may not have heard of.

38:58

[DJ]: Like, well, I don't really want to say Chrome, but at least then you're only feeding your data to Google, which you're probably already doing.

39:04

[JM]: Don't use Chrome.

39:05

[JM]: Just use Safari, Firefox, Vivaldi, Opera... insert anything else really other than, say, Chrome and browsers powered by large language models.

39:17

[DJ]: You might have, like, a retro virtual machine running Internet Explorer 6 just to let you re-live the glory days of 2002.

39:25

[DJ]: That could be fun.

39:28

[DJ]: Take Justin's advice for what browsers to use.

39:30

[DJ]: Mine is bad. Bad advice.

39:33

[DJ]: Don't use Atlas and don't do any of the things Dan suggests either.

39:37

[JM]: All right, everyone.

39:37

[JM]: Thanks for listening.

39:38

[JM]: We hope you enjoyed the show.

39:39

[JM]: You can find me on the web at justinmayer.com and you can find Dan on the web at danj.ca.

39:44

[JM]: Please reach out and share your thoughts about this episode via the Fediverse at justin.ramble.space.