xoxo.zone is one of the many independent Mastodon servers you can use to participate in the fediverse.
A community space for attendees and speakers of the XOXO Festival, held in Portland, Oregon. This Mastodon instance is community-run and is not affiliated with the past organisers of XOXO Festival.

Administered by:

Server stats:

239
active users

Now this is interesting. Bluesky's ATProto is *explicitly designed* around people being able to replicate and index it. But here's a great example where peoples' data is being replicated and indexed and users are *furious* about it:

bsky.app/profile/jasonkoebler.

I'm not saying people are wrong for being upset, or that it's wrong to build a protocol that is built around replicated indexing

I'm saying that if both of those seem to be butting heads, *some* sort of disconnect is happening

Bluesky Social · Jason Koebler (@jasonkoebler.bsky.social)An employee of Huggingface, a site of AI training datasets, made a dataset of a million Bluesky posts scraped simply because they could. It’s currently trending: https://www.404media.co/someone-made-a-dataset-of-one-million-bluesky-posts-for-machine-learning-research/

Similar things have happened on the fediverse of course. Personally I am not as opposed to having global search for *public* posts; I think that's semi-inescapable. Whether it should be opt-in or opt-out is a different thing.

But I think what's true both on the fediverse and bluesky is that people are communicating in ways that easily *can be* indexed and which are public and I think people want more community-oriented private communication than feels "easy to do" on these systems.

@cwebber I think it's simple: posts on #bluesky and #mastodon are intentionally public by design. If you want to have private communication, you can use #signal.

Annika Backstrom

@folkerschamel I think your standards are far too low. What of consent? What about reasonable expectations for how we use the commons? What about not being exploitative? Just because we *can* scrape posts, and just because someone probably will, means we should give up?

@annika I am in favor of protecting copyrights, as well as technical mechanisms similar to robots.txt, including not using it for AI training.
But publishing, replicating and indexing data is exactly the purpose of #bluesky and #mastodon, and when you publish something there, you give permission for it.
Analogous: I wouldn't understand if someone published a website and then complained that the data was being loaded into the visitor's browser.