# README
Transkribus Credits Update Automation
This Go script automates the process of updating a wiki page with the remaining amount of Transkribus credits left for the Wikimedia account. It automates the process of fetching the value of remaining credits from the Transkribus dashboard and updating the Data:Wikimedia_OCR,_Transkribus_quota.tab page by adding a new row at the bottom of the table.
Local setup
Prerequisites
- Have Go installed and working on your system. You can follow the official docs to do the same.
- Have an instance of Mediawiki core running on your system so that you have your own local wiki to test with and do not rely on modifying any pages on hosted wikis.
- Since we are dealing with a
.tab
page, it requires that theJsonConfig
extension be installed as well. Read more about.tab
pages here.
Clone the repository
You can clone the repository by running the following command
git clone https://github.com/parthiv-m/tr-stat-update
If you wish to fork and then clone the repository, you are welcome to do so!
Set environment variables
The environment variables required to run the script are provided in the .example.env
file.
How to obtain Transkribus credentials
This script is strictly for the Transkribus account managed by Wikimedia. However, it can be generalised for any Transkribus account.
How to obtain bot credentials
- Navigate to
Special:BotPasswords
on your local wiki. - You will be prompted to enter details like bot name, and clarify the grants required for the bot. We only require permission to edit existing pages.
- The subsequent page gives the bot username of the form
username@bot_name
and a password. These are to be mentioned in the.env
file appropriately.
Install packages
There are no makor dependencies used in the script except for the godotenv
package to handle the .env
file. Nevertheless, we will install all possible packages listed in the go.mod
file using the command go get .
Once this is done, you are all set to run the script!
Running the script
In general, the command to run a go script is go run <filename>.go
. In our case, this becomes go run main.go
.
For the dev environment
When run without any arguments, the script runs in development
mode. This is indicated by the logging statement
Running in development...
For the actual wiki page hosted on Commons
[!WARNING] This will modify the publicly available page. Only run if you are sure of what you are doing!
To run the script to update the actual wiki page on Commons, run it as follows
go run main.go production
This will product a logging statment that says
Running in production...
Long term usage
If you are not a developer and are not interested in tinkering around with the script, but still would like to run the script from time to time, it is best to download a binary of the script from the releases section.
[!NOTE] Currently, binaries are available only for Linux.
Extracting the downloaded .tar.gz
file using the tar -xvf <file_name>
command should result in a tr-stat-update
file as the final executable. You will still be required to set the appropriate environment variables in the same directory as the downloaded file.
To run the executable, simply do
./tr-stat-update production
Logging
All logs for the script are stored in a debug.log
file in the same directory as the script. If you run into any trouble, you might want to check the logs!
What does the script do?
The script follows a linear workflow as outlined below:
- First, it authenticates itself to the Transkribus API using the login credentials provided by the user
- Next, it makes a request to fetch the total credits left in the user's Transkribus dashboard
- It then goes on to fetch the Data:Transkribus page using the Mediawiki Action API
- Once the contents of the page are available, the script authenticates itself using the credentials of the bot generated by the user
- After the bot is logged in successfully, the script requests for a CSRF token for the bot so that it can make edits safely on the wiki page
- Now, the script is ready to add a new row to the page, along with an apprpriate summary consisting of the date and time of updation of the wiki page
Further information
- Transkribus is a platform for the text recognition, image analysis, and structure recognition of historical documents. By means of its web interface and a desktop client, it provides users access to a rich set of features to transcribe texts and train custom handwritten text recognition models.
- Wikimedia OCR is a web service and interface for providing OCR text from images hosted on MediaWiki wikis. Transkribus is the newest addition to the set of OCR engines available on the tool. Try it out now!