Many authors discovered their books had been uploaded and scanned into a sizable dataset without permission on Monday morning when they woke up. Prosecraft, a project of cloud word processor Shaxpir, assembled more than 27,000 novels and compared, ranked, and analyzed them based on the “vividness” of their language.
Numerous writers expressed displeasure with Prosecraft for training a model on their works without permission, including Young Adult superstar Maureen Johnson and “Little Fires Everywhere” author Celeste Ng. Even recently published novels have already been uploaded.
Following a day of justifiable internet outrage, Prosecraft’s developer, Benji Smith, removed the website, which had been online since 2017.
Smith stated, “I’ve worked on this project for thousands of hours, cleaning up and annotating material, organizing and adjusting things. But ‘artificial intelligence’ became a thing in the meantime. Additionally, the early applications of artificial intelligence have harmed its introduction by making it simple for anyone to mimic artists, thus removing them from their creative processes.
AI Frustration
Since Smith’s Prosecraft had gathered a quarter billion-word dataset from published novels that he obtained by trawling the internet, writers were concerned that it would turn into a generative artificial intelligence tool.
Prosecraft displayed two passages from a book: one that was “most passive” and one that was “most vivid.” The novels were then ranked in percentile rankings according to their vividness, length, or passiveness.
“If you’re a writer as a career, it’s maddening, partly because style is not the same as writing a fucking whitepaper for a business that needs to be in active voice or whatever,” novelist Ilana Masad remarked. “Styling is styling,”
Multiple attempts for comment went unanswered by Smith, although he went into further detail about his goals in his blog post.
As Smith explained, “I assumed I was upholding the spirit of the Fair Use concept, which does not require the approval of the original author, since I was merely releasing summary data and brief passages from the text of those books. Some writers complained that Prosecraft’s snippets of their novels contained significant spoilers, which added to their irritation.
Smith apologized, but the authors are still irritated. The current explosion of artificial intelligence technologies has turned writing and art into a painful game of whack-a-mole. They discover that their work has been utilized for training another artificial intelligence model as soon as they choose to remove themselves from one database, and so on.
For these sites and initiatives, Masad added, “It’s very much usual to do whatever they’re doing first and then hope that no one sees and then disappear or become defensive when they ultimately do.
Self-publishing technologies and generative artificial intelligence have combined to provide the ideal environment for fraudulent activity. Low-quality AI-generated travel guides and even artificial intelligence-generated children’s books have swamped Amazon.
However, because ChatGPT effectively trains on the entirety of the internet, genuine travel writers or authors of children’s books may unintentionally be plagiarised.
In a recent blog post titled “I’d Rather See My Books Get Pirated Than This,” author Jane Friedman claimed that she is the victim of an impersonation on Amazon, where someone is hawking books bearing her name that seems to have been produced using artificial intelligence.
Despite her success in having these bogus books deleted from her Goodreads page, Friedman claims that Amazon will not take down the books listed for sale unless she registers a trademark for her identity.
Before publishing, Amazon made no comments. Masad stated, “I don’t think any writer is truly worried that artificial intelligence is going to wreck novels because, well, that’s not how literature works, and everything I’ve seen ChatGPT create as a “story” is just very fucking dull with no voice or actual craft or style.
She is concerned that AI-generated promotional content may replace marketing and PR staff if publishers are persuaded otherwise. Source