It was just over a month ago when quite some of us posted about their Spotify 2024 achievements. It seems all of you are using this platform for what it was created for: Music. Also, I use Spotify. Though some of my IRL music artists' friends tell me they are big-ass money grabbers, filling their own pockets while leaving those of the artists mostly empty. My excuse for them is that I mainly use Spotify to listen to podcasts. As far as I remember, in 2024, I listened to podcasts 3 to 4 hours a day on average. I do listen to music online, using different platforms such as Soundcloud. To my taste, Soundcloud has better recommendation algorithms, and overall, they have much more music I prefer. But that aside, the point of this post.
Today I am going to address two topics:
- Learnings from a recent podcast I listened: AI Data Poisioning
- Call to all of you to bring more variety to our 'Q' community
What is AI Poisoning
Since you all can easily find tons of information about AI Poisoning by using search engines or AI services around, I'll limit myself to quote a definition by Crowdstrike:
"Data poisoning is a type of cyberattack in which an adversary intentionally compromises a training dataset used by an AI or machine learning (ML) model to influence or manipulate the operation of that model." (source)
Before diving into AI Poisoning and how this can be used for the benefit of Artists copyrighted digital materials, first this...
AI training Data and Copyright Infringement
Since the release of Large Language Model (LLM) based AI to the general public, a bit more than two years ago, a lot of debate started about the methods all the big-ass AI companies are using: Crawling and scraping the entire Internet for content to train their AI models. In essence, nothing wrong with that since training AI models requires tons of content. No content, means no LLM-based AI models.
However, as long as we have copyright laws, this has not only resulted in increasing debates about copyrighted infringement but also led to an ever-increasing number of legal/court cases. Two years ago, some warned about this going to happen, as can be read in the article "The current legal cases against generative AI are just the beginning", published by TechCrunch (source). By now, the list is huge! Sustainable Tech Partner in their "Generative AI Lawsuits Timeline: Legal Cases vs. OpenAI, Microsoft, Anthropic, Nvidia, Perplexity, Intel and More" article, published just over 2 days ago (source) gives a feel for what is happening right now. I suspect the industry giants' legal departments will get very busy fighting all this.
AI Poisoning used to Protect copyrighted Digital Content
While Crowdstrike defines AI Poising as something bad by framing it as "The Exploitation of Generative AI" (source), AI Poisoning can also be used to 'fool' AI training models in classifying content wrongly preventing AI companies from making use of such data.
Although I am a big AI enthusiast, I wasn't aware of this until I listened to a podcast by Freakonomics. A podcast channel that became one of my favourite podcast channels I discovered thus far. They are not going for fast, quick, superficial content, but go into depth in all they touch and talk about. I can honestly state that their tagline, "The Hidden Side of Everything", is a statement that is honest to the bone.
Ben Zhao, Neubauer Professor of Computer Science at the University of Chicago, and Shawn Shan, Computer Science PhD Candidate at the University of Chicago, stepped up and co-created with a small team two tools for protecting artist-owned and copyrighted digital images from being effectively used for training LLM models.
While Glaze is a tool to make life difficult for those who take an existing model and try to specialise the model by further training it with copyrighted images, Nightshade is a tool that prevents training of all the general purpose LLM models by earlier mentioned tech leading enterprises in the AI segment. By now, the tools are used by an increasing number of people, with millions of downloads registered in the app stores. It is CRAZY these guys are offering these tools free of charge. That said, this is a pure form of science. Create something new and allow others who perhaps are better at setting up non-profit or for-profit companies to commercialise the technology.
My wish: Whoever intends to market these types of tools will do it for the artists, not just to fill their pockets with tons of money.
Salient detail: OpenAI, the company that turned away from being an open source company into a for-profit company with a valuation of around 150 Billion Dollars, standpoint towards tools such as Glaze and Nightshade is one that is absolutely horrendous. They say: use of these tools is abusive behaviour (source). Surely, if I were Sam Altman, I would state the same since I have to protect the interests of my shareholders. Surely, they created something we can all benefit from, free of charge, if we don't want to pay for the service they provide. However, this doesn't grant them the free card to use any content they can get their hands on. Or?
I'll not be 'boring' you with my review of the podcast, since I believe it is mucho better for you to just take 30 to 45 minutes of your time and listen to the whole thing yourself.
listen podcast: click image above, me, or one of the other platforms here.
Consideration
Perhaps you know my stance on Patents. Perhaps you don't. In brief, I am against patents for many reasons. Though holding patents can protect inventors from others running away with whatever they invented, it also drives the creation of moguls and monopolies. For long I questioned if I should also take a similar stance towards copyright. On the one hand, copyright does the same to content as patents do to other types of inventions. Both protect something from being used by something or somebody else. On the other hand, patents drive the creation of large corporates and monopolies and slow down innovation; this can't be said for the copyright of content, whether this be textual, image or video. However, such type of content is so easy to copy and will become increasingly easier with more AI in our lives that it seems almost an impossible task to ban abuse of copyright laws on a grand scale. Perhaps the work by Ben Zhao, Shawn Shan, and others will help to fight this, but somehow, my gut says it'll be a darn hard fight to win.
What are your Thoughts?
Go for a comment, or write your thoughts in a fresh new post.
Finally: Call to all of you to bring more variety to our 'Q' community
As I mentioned in my intro above, I call upon you to bring more variety in content in our community. The rules and guidelines in our community profile still apply. Somehow, music must be involved. I hope I showed you with this post that the link to music isn't limited to what one listened to or some song we remember due to some life situation we experienced. However, this is much wider in context. From today onwards, I'll focus my curation towards diversity in our community (and outside).
Keep up the good work, and let your creativity drive not only a vibrant community but add value to the whole HIVE social network.