Hi all, I recently ran 2 studies. At the end, participants get a code that they have to submit with the HIT. Each took about 30min and we paid $6 ($5 base with $1 bonus). Everything came in fine. Every once and awhile, someone would email that they ran into some error, and in a couple cases (out of about 200) people submitted their MTurker ID when they were running out of time, then finished the survey. However, now I'm running a study that takes about 55 minutes, paying about $10 ($9 with $1 bonus). All of a sudden, about half of the HITs turned in are just people's MTurker IDs, and they aren't completing the task. I can't figure out what's happened. You'd think that if someone thought the task was too long/hard, that they would simply turn the HIT back in and move on. But they will submit, and now I'm in the position of having to decide whether to reject these useless HITs. (So far, I haven't because I don't want to get a bad rep, but it's been >$100 down the tube and I can't keep paying people who do this.) Anyone else run into this / have suggestions?
If its falling in line with the times that you saw on the previous survey, is there any chance that workers are getting linked to that one instead? Have any of the people who submitted using their ID sent a message stating why? Code errors and survey errors are really common, so usually when seasoned workers have to submit with their IDs they just send a quick message with what happened.
Thanks for your reply @InfiniteChanges -- Yeah, that's the thing that's strange: before, folks would always send me a quick message explaining what's up. My criteria is 98% approval rate Turkers, and so far, by and large, most folks have been really conscientious. But, no, none of these are sending messages. It's such a sharp contrast that this is happening now. I checked so many times that each piece of my experiment is working, and I can't figure out what else it could be. Nonetheless, I'd think that if people were just disdainful of my rate, they'd either quit/not accept, or at least send some sort of message.
What kind of quals do you have on your hit? Assuming you are using US qual (assuming that's the demographic you are looking for) and a greater 98% approval rate half the hits being scammed seems extremely high. Maybe there is some issue with the data not being submitted to you correctly? If you want to talk directly to the people you are considering rejecting you can try bonusing with a penny and telling them you didn't get the data and you need them to message you or you might have to reject. I don't know your requester name but you can search for it under https://turkerview.com/ and see if anybody left a review saying if they had any problems.
Thanks @laby that's a good suggestion, I'll try it. I actually do monitor myself really closely on TurkerView. I try to be fair on pay, and have never done rejections because I know Turkers don't like it (totally understandable, since it can hurt your work prospects). I have plenty of good ratings from the other 2 studies that "worked", but only one (positive) from the subjects I ran on this new study. Also, while I'm on the topic of Turkers' opinions--a number of those who completed the survey had positive things to say, and no one said anything bad. I realize there could be a selection bias there, but still, people have said negative things before, respectfully. So at a loss in that regard too.
1. What metrics do you have available on the user experience? 2. Do the MTurk ID submissions come in clusters or evenly spread out? A single, or small group of workers are capable of producing this behavior oddity. If an individual who controls several MTurk accounts at any given time. Having prior experience from your last task, the user knows that you are the kind and reasonable requester who will accept a worker ID in place of a submission code. This user sees a $9 task with a 98% completion requirement. 10 accounts with 1 HIT approved on each produces 10x100% approval rated accounts This narrative becomes a bit more far fetched with the costs of a good US IP address and the investment of both time and money in setting up and seeing through with the approval process of the accounts. An individual looking to hedge their earnings will likely be cooperating with other nefarious workers and coordinating submissions in a small group. On the flip side, a syntax or logical error somewhere in the code, or ambiguous phrasing that leads the worker to perceive the task as successfully completed. I have had my share of tasks that abruptly end. Normally I will return to avoid the risk of a rejection, but sometimes I pop open the browser console to see what happened. With some errors, I copy over the cookies to a different browser and pick back up where I left off. Sometimes I have to do this if an experiment has a surprise WebGL component or a video with goofy codec that will play on Firefox but not Chromium. I am not sure where I was going with that, but if you want to post a copy to http://workersandbox.mturk.com/, I'll try to break it.
Just reject the ones you do not have validation for in your study since they did not contact you saying there was an error or something of that nature. If they are not scamming you, they will contact you about the rejection and you can reverse it. If they are scamming you, you will not hear back about it.
Thanks @Fuzzy_Dunlop that's an interesting point. I'm realizing that it's hard to have a straightforward stance on MTurk sometimes--if you reject people, some may not like it but if you don't others may try to take advantage. I think that I'm going to say up front that people WILL get rejected if they don't submit a code, and they can make an informed decision, then follow the suggestion of @billy. We've piloted the heck out of this thing WITH guidance of the developers who wrote the software program, so I'm pretty sure it's not a bug.
You could also put a spot in your study for them to input their worker ID as well as giving them a code so you have double validation.
I generally appreciate the candor approach, but I have an alternative approach that would be less off putting by removing the threat of rejection. In short, a client-side script that will validate the code prior to submission. It would need some minor obfuscation, and something more complicated that binary arithmetic. Here is an example using XOR as a validation constraint. Code: <script> /* Random Unique Identifier for worker */ const UUID=Math.floor(Math.random()*10000); /* Magic value to test against */ const MAGIC=314159; /* * Let us assume that there is a paragraph on the page that says: * "Good job user, your completion code is" + CODE */ const CODE = UUID ^ MAGIC; </script> <script> /* * On each change event for the completion_code text input box, * this function checks to make sure that: * COMPLETION_CODE XOR UUID === MAGIC * This should only be true when the user has entered the * values we calculated above. * * For reference: * if C = A XOR B * then (A XOR C) === B * also (B XOR C) === A * Very easy to validate */ function submit_button_enabler (event) { /* * user_input should equal the CODE variable above * Let us assume that there is a paragraph on the page that says: * "Good job user, your completion code is" + CODE */ if (event.target.value ^ UUID === MAGIC) { document.querySelector("#task_submit_button").disabled = false; } } document.querySelector("#completion_code") .addEventListener("change", submit_button_enabler); </script> <body> /* * Here is generic code for an input box and a button to submit * The button element remains disabled until the input value of * the element is proven valid. */ <label for="completion_code"> Please Enter Completion Code Here </label> <input id="completion_code"> <button id="task_submit_button"> Click to submit </button> </body>
@Fuzzy_Dunlop Oh, very cool. I really like the idea. However, you're talking to a social scientist guy who takes an Occam's razor approach to anything programming-related I.e., unfortunately, my necessary priority is to jerry-rig something as fast as possible to get 2 relatively small studies done. If I end up doing more things on MTurk and running larger volume, I will definetly consider this kind of modification. Thanks for the thoughtful resource!
Do you have some sort of disqualifier in the middle that is not indicating they failed out? For example, survey abruptly ends with no "Sorry, but you failed xyz check, we cannot compensate you...". Often times, people may think it was just another broken survey with no code, especially if they have already invested a considerable amount of time. You indicated that people usually message you, but you can look for your profile on TurkerView and see what people are leaving as a review for your HIT. Often times people will note that there was no code. Worse case scenario, you can reject a few like other mentioned and wait for the hate mail to see why there is a disparity.
Hi @j0sh83 -- nope, I don't have a disqualifier (I probably should, with such a long study, but so far my data hasn't showed any unusual patterns that indicate people aren't paying attention.) It's not a hard task in the sense that if people try, everyone can do it with minimal mental effort (success isn't so much contingent on correctly answering a hard question, like math, as it is upon simply following along and indicating your responses as a matter of opinion). Yeah, so far TurkerView for these studies has been neutral-positive. I do care and try to keep tabs. So far since I put out the warning of rejection upfront, 3 people submitted no code; one wrote me a short, general email stating there was an error they had proof of. But when I pressed requesting the proof, they never got back. So, this strengthens my conjecture that the no-code people were just taking advantage of the system. Of course, I don't mind paying someone if they can produce an error message. But rejecting when I've already warned folks and they have no proof seems pretty fair to me.