domain blacklist security MOD

Discuss what MODs you would like to see created.

domain blacklist security MOD

Postby admin on Mon Jan 29, 2007 1:46 am

Currently, BetterWindowsSoftware employs a strict set of changes to the code and database that make it nearly impossible for sites that are blacklisted (according to duplicate - hijacked PAD files) to submit new PADs to our database. It was becoming a serious problem for our site as we were having some days with around 2,000 new submissions -- with all but a few of them being duplicates. :x

The duplicate content issue causes problems with search engines -- thinking that our PADkit sites are becoming SPAM sites due to the amount of duplicate page titles / content.

What would be the requirements for a MOD of this nature??

Should there be the option to only disable the records that are blacklisted? This is currently what we are doing at BetterWindowsSoftware.com - as we display a custom 404 page whenever any of the blacklisted content is called up anymore... the 404 page allows the user to search for the original listing but won't display any of the blacklisted listing's links.

Should the blacklisted domains be stored in a new table -- that is managed through the pad-sysop features? This seems like the best option for webmasters.

Should additional code be included for addpad.php that will look at the descriptions of any new submissions - to compare against the database for an identical description? If the description of the new submission matches anything from the database, should the submission simply be rejected on those grounds, or should the user be redirected to a form to enter an email validation that this listing should have the same description as another? Other ideas?
User avatar
admin
Site Admin
Site Admin
 
Posts: 28
Joined: Wed Aug 09, 2006 8:14 pm
Location: Somewhere in USA

finding PAD file hijackers

Postby admin on Tue Feb 20, 2007 5:35 pm

I thought that it would be useful to post a couple SQL statements that could easily be used to find records in your PADDATA table that are copies of others...

Finding duplications when title or company is changed
The hijackers sometimes just submit their new version of the PAD under a completely new company name or title - thus, getting around the addpad.php code. This creates duplicate listings on your site. You can find the duplicated listings by running an exhaustive query on the descriptions. I am not sure if there is a better way or better SQL to maximize the efficiency of the query, but here goes...
Be warned that this may take quite a while to execute on a large database - and that this caused my SQL server to be unresponsive to other queries!!! You may want to do this with an offline copy of your database rather than a live website database.
Code: Select all
select descs.description, descs.padfile, copies.description, copies.padfile
from (select description, padfile from paddata) as descs
inner join (select description, padfile from paddata) as copies
on (descs.description = copies.description) and (descs.padfile <> copies.padfile)


Finding records with carriage returns ("/n") in them
Some hijackers change the title so that the code in addpad.php doesn't think it is the same company; since the "/n" character is not displayed, the title looks normal to the end-user.
Code: Select all
select *
from paddata
where title like '%\n%'


Once you've found some hijacked records, you can either delete them or you can disable them. To disable them requires code changes as well as adding a field to the paddata table. If your site is indexed, it may be better to disable the records and display a custom 404 page for each. We chose to do this at BetterWindowsSoftware; if you land on one of the hijacked PAD pages here, you will get our special 404 page that enables the user to find the original content that was duplicated.

Dealing with overwritten data
There is no known way of dealing with this that doesn't involve additional code and some kind of addition to the database. PADKit webmasters have wrestled with this issue since the first hijacker overwrote somebody's data.

A new table could be created named PADDATA_MIRROR to mirror simply the "title/company" + "padfile" values of initial submissions. With this new table, it would be possible to scan for any records that don't match your PADDATA_MIRROR data.

Code: Select all
CREATE TABLE `paddata_mirror` (
`title` VARCHAR( 50 ) NOT NULL ,
`company` VARCHAR( 50 ) NOT NULL ,
`padfile` VARCHAR( 128 ) NOT NULL ,
`submittime` TIMESTAMP NOT NULL ,
PRIMARY KEY ( `padfile` )
);


Open addpad.php - insert this code after:
Code: Select all
  if (!mysql_query("replace into paddata ($flist) values($vlist)", $link_id)) die("Error updating paddata table");

Code: Select all
  if (!($replace)) {
    $flist = "company, title, padfile, submittime";
    $vlist =  "'$company','$title','$padfile',now()";
    if (!mysql_query("insert into paddata_mirror ($flist) values($vlist)", $link_id)) die("Error updating paddata_mirror table");
  }

We aren't doing anything special at BetterWindowsSoftware with respect to these overwritten records. If we find any, we manually import the original PAD. Furthermore, we aren't planning to write a MOD to do this, but we may incorporate the PADDATA_MIRROR table to make things a bit easier when it comes to restoring the initial data.
User avatar
admin
Site Admin
Site Admin
 
Posts: 28
Joined: Wed Aug 09, 2006 8:14 pm
Location: Somewhere in USA

Postby admin on Sun Feb 17, 2008 7:21 pm

All told -- today, I actually disabled about 13,000 records in the last day and a half.

The distinction between a SPAMMER site and an authentic site takes a bit of research, but at times it was obvious. Some of the idiots have no idea that my site will always take their affiliate info out of their RegNow links.

We will not allow our site to have so much duplicate content (there's a Search Engine penalty for this that we're trying to avoid). There is absolutely no reason to have 20 listings the same EXACT program for "DVD to iPod converter" - the only distinction between these duplicate listings is who gets the commission when somebody buys a program (HINT: the answer is BetterWindowsSoftware will get all commissions on software sales - if we still list the original submission).

If your domain is on this list, and you are an authentic software author (NOT an affiliate or reseller), please contact the webmaster of the site and he'll be happy to remove your site from the filter and restore all of your listings.

A special thanks goes out to one specific company that got caught. It was easy to see their pattern after about 60,000 records.

Here's part of our current list of blacklisted domains. Feel free to add domains to our list.

Code: Select all
.2008software.
.3gp-converter.
.4donentsoft.
.4submit.com
.6te.net
.74.52.237.188
.affilistore-plus.com
.allconverter.
.allforvideo.biz
.allripper.
.alltodvd.
.allvideotools.
.appcraft.
.appletvconverter.
.appletv-converter.
.apple-tv-converter.
.appletvsoftware.
.appletvtools.
.artdownload.
.Artrepodsoftware.
.asf-converter.
.Atesalet-media.
.audiocdcloner.
.audiocdripper.
.audio-mp3-recorder.
.audioripper.
.audio-ware.com
.BalekXonotSoft.
.bdmvconverter.
.Beautifuwemedia.
.bestdvdtools.
.bestmobiletools.
.BestParkerSoftware.
.best-seller-reviews.
.bestvideoconverter.
.bestvideotools.
.BestWesternviddmedia.
.Bestyootools.
.BetaUltrateetools.
.Bigtitiestarware.
.blu-ray-converter.
.bluray-converter.com
.burndvdmovie.org
.buydownload.
.Cartirepointinsoft.
.cdaudioconverter
.cddvdconverter.
.cddvdripper.
.cdtomp3.org
.christmasdownload.
.coldrg.com
.com-download.
.Conidasmeida.
.daoinwod.com
.datconverter.
.Derterkonsoft.
.digidownload.
.dirme.com
.discount-guide.com
.divx-converter.
.DonaMonthsoftware.
.Donnatisoftware.
.Doruichicks.
.doupload.
.download-soft.
.dvdaviconverter.
.dvdcdconverter.
.dvdcdripper.
.dvdipodripper.
.dvd-ipod-tool.com
.dvd-ripper-software.
.dvd-ripping.
.dvdtoall.
.dvd-to-iphone.
.dvdtompegx.com
.dvdvdsoftwear.
.dvd-wizard.co.uk
.enascor.com
.eplanetlabs.net
.evoconverter.
.fearishere.
.feeding-frenzy
.flashconverter.
.free--download.com
.Gardlesnonswill.
.Gionprotesofts.
.godmoon.
.godownload.org
.goldenstarebooks.
.goldmedalsoft.
.GramReleasofts.
.greatcarforsale.com
.hddvd-converter.
.hddvdripper.
.High-Quality-Software.
.hotdvdtools.
.illlll.com
.iphoneconverter.
.iphone-software.
.iphone-video-converter.
.ipod-converter.org
.ipod-mp4-converter.com
.lllii.com
.mostshareware.
.mostsoft.
.mp3towav.
.mp4-converter.net
.mp4-psp-ipod-converter.
.musthavesoft.
.newqite.
.onedownload.
.pcbackup.
.pluskit.biz
.popchristmas.
.popsoftware.
.psp-ipod-converter.
.rip-dvd.
.sharewaredownload.
.sharewaremedia.
.simplehomepage.info
.soft29.
.software-phile.com
.Stakwheimedia.
.SudokuecSoft.
.Sunsharevidis.
.svcdconverter.
.techpedia.net
.tomp4.
.topfourreviews.
.topthreereviews.
.topvideopro.
.triplequadturbo.
.tryitfreedownloads.com
.twodownload.
.urlcut.
.usdigi.
.usdownload.
.vdownload.
.videoconvertersoftwares.
.videodvdburner.
.videodvdcloner.
.videodvdripper.
.VinceStatSoftware.
.vistaconverter.
.vista-download.
.vxak.com
.Warestarkon.
.www.iwellsoft.com
.www.usa-download.com
.www.xilisoft.com
.xmasdownload.
.xoftspy.org.uk
.Ymbrnersoft.
.Youngbeentogeth.
.ZooKooware.
.ZooXooinfomedia.
.zuneaudiosoftware.com
.zuneconverter.
.zune-converter.
.zunevideosoftware.com

Note that some of these may actually block authentic submissions. It is impossible to determine whether or not some of the submissions have overwritten critical data that was used to make our list.
User avatar
admin
Site Admin
Site Admin
 
Posts: 28
Joined: Wed Aug 09, 2006 8:14 pm
Location: Somewhere in USA

Re: domain blacklist security MOD

Postby jonwood on Tue Feb 03, 2009 5:22 am

I'm very interested in this discussion. I implemented a shareware site using the free ASP PAD code and, while it was nice to see the number of listings grow so rapidly, it soon became apparently that the spam was ruining the site.

Now, I've just completely rewritten the site using ASP.NET. (So the specifics of my code might not interest those here.) But this time I implemented a host of anti-spam measures, and I've just cleared out around 10,000 bad listings and my banned list has grown to about 2,500 domains.

While all this was fresh in my mind, I've written an article that describes some of the techniques I found the spammers to be using, and also describes some of the techniques I used to deal with them. In addition, although this may change in the future, I'm also making my current banned list public. This article and list is available at http://www.fileparade.com/PadSpam.aspx.

I may explore automating spam detection but, as most here understand, that is a tricky proposal and will never be 100% accurate. For now, part of my anti-spam measures including not approving any listing until I've personally looked it over.

Jonathan Wood
File Parade
http://www.fileparade.com
jonwood
Newbie
Newbie
 
Posts: 1
Joined: Sun Feb 01, 2009 8:51 pm


Return to Wish-list for PADKit 2.09 MODs

Who is online

Users browsing this forum: No registered users and 1 guest