View Single Post
  #5  
Old 08-27-2024, 12:09 AM
John John is offline
Administrator
 
Join Date: Aug 1999
Location: NJ, USA
Posts: 2,312
Quote:
Originally Posted by carnation View Post
Can they be blocked?
Usually, yes. But not always. Depends on how determined the bot/spider operators are.

I rerouted these particular bots in a way which should minimize their impact on GC. Similar to what I did with the bots from approx 2 weeks ago.

Quote:
Originally Posted by cheerfulgreek View Post
Is that what happens? The more interesting the site, the more bots that show?
Yes, that's pretty much how it goes as far as I know. At least with search engine type bots as search engines want to direct their users to interesting & useful sites.

GoogleBot is a good example of a beneficial bot as that bot indexes sites & helps increase listings in search which might result in more new website visitors.

Quote:
Originally Posted by cheerfulgreek View Post
I don’t understand what they collect and the importance of what they collect.
The bots from Bytedance / TikTok seem to be AI data scraper bots. But might also be search engine related as well.

They might be training AI / artificial intelligence chatbots using conversations that they scrape from forums all throughout the Internet.

Around a decade ago the world entered a new era of AI. Way back in the day one way AI was made was with programmers creating decision trees to determine what to do depending on different scenarios and that would be many levels deep. Modern AI doesn't do that. Instead, modern AI gives the engine, the neural network, as much labeled data as possible which the AI trains on in order to recognize patterns within the data. The AI is trained & tested over and over, repeatedly. Constantly refining, tweaking the neural net while continually improving accuracy. AI recognizing patterns in images is a good example of this type of training.

With GC being text based conversations, AI data scraping on that is likely to be for training AI language models. Basically very sophisticated chatbots. Similar to ChatGPT if you're familiar with that. Huge amounts of data, as much as possible, is used in training these AIs.

Quote:
Originally Posted by cheerfulgreek View Post
And then where does what they collect go?
My guess is that if used for training AI, the data is refined and is used for training AI now and maybe even in the future. Could be one giant combined dataset of many different online forum discussions that they continually add to & improve. All that data may be used for training AI, but might never actually be seen by anyone other than the AI researchers and/or data engineers working on it.
__________________
John Hammell
Network Admin, GreekChat.com
Reply With Quote