Talk:Toolhub/Archives/2018

From Meta, a Wikimedia project coordination wiki
Latest comment: 5 years ago by PerfektesChaos in topic Internationalization and multilingual issues

Comments on Modifications from Tech News: 2018-18

Welcome at your comments. --Rical (talk) 19:04, 30 April 2018 (UTC)

Translation: please clarify Background

Under Background, it says “...Our foundation will be [[$hays|Hay's Tool Directory]], which describes over 450 tools through ...”.

Which of the following is safe to translate into? “...Our foundation (a) is going to move away (b) transfer from (c) discontinue using [[$hays|Hay's Tool Directory]], ...”? My assumption based on the description given thereinafter. ----Omotecho (talk) 08:30, 19 May 2018 (UTC) Omotecho (talk) 08:30, 19 May 2018 (UTC)

I think something more like "...we will will start from [[$hays|Hay's Tool Directory]], ...". "Foundation" here is used in the sense of w:Foundation (engineering) meaning the base on which a larger structure will be built. --BDavis (WMF) (talk) 20:02, 19 May 2018 (UTC)
@BDavis (WMF):, yes, I agree we avoid the term “foundation” to save confusion. So, “...start from...” sounds very good to me. Regards, ----Omotecho (talk) 21:19, 19 May 2018 (UTC)

A few points/questions

Hi

I'm not sure if I'm writing this in the right place. A bit of background, I write quite a lot of documentation for Wikidata and previusly worked with Hay on trying to create a tool to record documentation on Wikimedia projects some years ago. A few points/questions:

  • The other WMF project to document resources that I'm aware of is the Wikimedia Resource Centre, how will these two resources interact? There is very likely to be at least some things documented in both places and its not clear where the boundary would be. E.g Flick2Commons is a tool, but also there are instructions on how to use that tool, it seems likely that the tool and instructions would possibly appear in both Wikimedia Resource Center and Toolhub. I guess my question is how will these interact with each other? Will there be any shared software between the two resources? How much crossover will there be? If there is likely to be a high percentage of duplication of resources could these be combined into one site? What benefits and costs are there to keeping them seperate or combining them?
  • What costs and benefits are there to creating a database seperate to Wikidata? Looking at the data model it appears as though these fields could fit into Wikidata, could it be possible to hold the information for the Toolhub on Wikidata, with the Toolhub interface simply being a Wikidata explorer, perhaps Mounmental could offer an idea of the kind of thing that would be possible.

Thanks

John Cummings (talk) 09:17, 29 May 2018 (UTC)

Hello John Cummings,
  • A very good question; thankfully I am also familiar with the Wikimedia Resource Center. The specific example you give has a fairly straightforward answer – tool documentation would be coupled with the tool and thus included as part of that tool's record in Toolhub. But in general I view the Resource Center as a superset of Toolhub, meaning anything in scope for Toolhub is also in scope for the Resource Center. I've talked this over with María, and the current plan to prevent duplicate effort is to embed a tools section for each audience view (for contributors, for affiliate organizers, etc). To make this possible, each tool will be annotated with an audience designation according to the same scheme used by the Resource Center, and the Resource Center will automatically pull this data. I don't know if this is the best approach, but it's the least complicated one and it helps keep the two resources integrated.
    • On the specific question of shared software – not at first, but something I would like to look into for the long term. The challenge in building any software like this is making it purpose-specific but not too purpose-specific (so as to exclude even the possibility of supporting other use cases). With Toolhub, my goal is to support the particular use case of documenting software tools well, but in a way that sets up best practices for the many other use cases of the Resource Center.
  • The main precondition is the Wikidata community deciding that tools are inherently notable and should be documented on Wikidata. While I think there is a clear value proposition in creating a database of community-built tools, I don't think the responsibility for maintaining this database should be imposed on any particular volunteer community – especially if such an imposition results in data being rejected. (It's worth noting that some of the data fields proposed for Toolhub don't seem like good fits for Wikidata, such as the "related topics" field that is basically a generic keywords field.) That said, the data model does provide a field for Wikidata ID, and Wikidata could be used for supplementary data.
Harej (WMF) (talk) 06:01, 30 May 2018 (UTC)
Thanks very much @Harej (WMF):, all good points. If the Toolhub had fairly static URLs for each tool then a Wikidata property could be created to link through to the Toolhub page for each tool. This may allow you to understand coverage of existing tools on Toolhub and make pretty graphs and maps and stuff. John Cummings (talk) 15:32, 31 May 2018 (UTC)

Internationalization and multilingual issues

If have one question and one remark/proposal.

On supported_languages I wonder how to specify that a tool supports many many many languages.

  • I understand that I might supply an Array of some particular language codes.
  • The major functionality of my lintHint gadget works with some 200 or more languages, I don’t know which and how many, since it utilizes the linter system messages provided by MediaWiki.
  • The mul is defined by ISO 639, but that is not translated into a message, even more not a reasonable explanation, and it probably won’t match a search for all Toolhub entries supporting French or Portuguese since mul does not match fr nor pt.
  • Perhaps a * language code becomes necessary, matching all language queries and explained as “almost every language”.

The other issue is that I am astonished about the JSON model.

"description": {
          "type": "string",
          ...
          },
  • I would have expected something like:
"description": {
          "oneOf": [
            {
              "type": "string"
            },
            {
              "type": "object",
              "items": {
                "type": "string"
              }
            } ], 
            ...
  • Looking at mw: Extension:TemplateData #InterfaceText (string or object) I would expect that I can provide various components either as string in one particular but unspecified language (hopefully English) or I would offer a map of strings, or at least one explanation and the assigned language code.
  • BTW TemplateData is a pretty good example for a possibly multilingual description of a utility.

Let’s take my lintHint gadget as an example. I would like to advertise it as follows:

"description": { "en": "Show LintErrors analysis live.",
                 "de": "Zeige LintErrors-Analyse live.",
                 "it": "Mostra analisi degli errori di Lint in diretta." },
...

(you might have a look at line 118 of the code which uses exactly that to introduce itself, depending on current user language).

The same as for descriptive texts goes for documentation pages. The current data model specifies that there is one and only one documentation page url as a kind of tool homepage.

The same as for these two components goes for many others. I would expect that a user has chosen to turn the GUI of Toolhub into an available language best fitting to his preferred tongue, or at least specifies a language preference. Then all multilingual items try to find the best match, with fallback to English, and if not English then the one and only chunk that is available. This goes for:

  • title
  • description
  • url
  • keywords
  • subtitle
  • feedback_url

Screenshots, video, additional information and others planned for later extensions are subject for localization as well.

Greetings --PerfektesChaos (talk) 12:57, 5 June 2018 (UTC)

Hello PerfektesChaos, thank you for your question. For the lintHint gadget, it would probably make sense to just mark it as supporting every language – "*" would be the way of marking that. Even if your tool doesn't support literally every language, it's probably close enough. (I'd be interested in knowing if it would somehow be misleading.) As for multilingual support, you are correct to point out that its structure implies only one language. This is intentional, to help keep the files straightforward. Instead of having every language in one file, and having to keep all the translations in sync in that file, the original record is just in one language. Then, the record is synchronized with Translatewiki.net, where volunteer translators can translate the fields into different languages. This way, translators can more easily find the strings to be translated and it is clearer what is the "official" record and what is a translation. As for tool documentation, in the planned data model, this is covered through annotations – a community-editable adjunct to the tool records – and it is possible to supply multiple links. I hope this addresses your concerns; please let me know if there is anything that should still be addressed. Harej (WMF) (talk) 19:55, 12 June 2018 (UTC)
Some text fragment translating poeple on translatewiki do not match my point.
  • url is split up into several links (URL or wikilink for simplification):
  • The best match of available URL with the current user language shall be offered, not somewhere hidden in some English annotations a remark that well if you do not understand English but German you might find a German covering URL. And nds or gsw might catch a fallback into the nearest appropriate language, as we are used to do in wiki software.
  • It is not a matter of text translation. It is not translatable, it does not provide all languages but some particular, the URL or links might point to various sites.
Same for:
  • feedback_url
  • Screenshots
  • video
Regarding synchronization of various translations I regard it as even better when English base description text and official translations by the provider are kept in one place. The descriptions are not made for eternity, like a phrase. Features and capabilities may be added or removed from purpose description and annotations. They are subject to dynamic development. I worry about diverging feature descriptions when updating the one and only English base description of the gadget and some diverging translations nobody knows about where and how to update them.
Instead of having every language in one file, and having to keep all the translations in sync in that file, the original record is just in one language. – Oh, yes, it is one and only one file and one record only. However, the data type of such elements is either string (probably English) or object (multilingual, with language codes pointing to various languages). That is even more “all the translations in sync in that file” altogether than having the English description of current cabapilities in the one and only file and the updates in other language somewhere on translatewiki. Please have a look at TemplateData which has one single definition only but multilingual texts inside.
Greetings --PerfektesChaos (talk) 08:42, 13 June 2018 (UTC)
Hello PerfektesChaos. If I understand your point here correctly, it’s that if there is only one field for the URL, there is a risk that alternate language versions end up being hidden. And that this is especially a problem with gadgets and user scripts since you link to specific wikis. I agree in principle that supporting multiple URLs based on preferred language would be better, and that this is different from tool metadata. However, since this version of the data model is supposed to be backwards compatible with the old one, that means the behavior of the “url” parameter is supposed to remain the same – just a URL and nothing more. However, for version 2.0.0, we can consider a “url” parameter that gives equal support for multiple URLs, including language variants. Would that address your concern?
As for other fields: I agree that screenshots and videos supplied should be annotation with the language, with screenshots and videos supplied depending on the language. As for feedback_url, is your idea that there are different destinations for leaving feedback depending on the user's language? Wouldn't that result in feedback being scattered in different places instead of organized for the convenience of the tool developer? (I'm interested in knowing if there is an angle I am not considering here.)
Harej (WMF) (talk) 03:47, 19 June 2018 (UTC)
  • I cannot see a compatibility problem between the expected release of the Toolhub version and some “old” software.
    • As soon as the release of the announcement on the reverse side of this talk page is launched − why, which and for which period any “old” software needs support by a limited data model?
    • My suggestion is compatible with all previous models, since it extends permitted data types from “string only” to “string or L10N object”.
  • feedback_url − yes, it is possible and in scope that there are specialised feedback channels for some particular languages, while any other is melted in a mixed help desk.
    • Please see w:de:WP:VE/FB which is a native German feedback page for VisualEditor.
    • Why are tools obliged to limit themselves to one feedback language only? Why not splitting if desired, answering French questions on French Wikipedia and all other languages on a babylon URL? Or a cyrillic or Chinese branch with native audience?
  • There are some issues which are not meaningful to be split, and others should be best adapted to the user language preference if available.
    • Upstream code distribution is a unique place. Version ID, programming language, application type (bot, tool, gadget) and other enumerations are not subject to language dependency. Release date as well, but might get localized formatting. Programmers as audience could suppose a single English documentation.
    • Most others could be offered at least in more language than only English, e.g. if the native tongue of the developer is Japanese or Spanish anyway. The English translation is a concession for the global community and might be mandatory, but the own version is already available before that.
    • Keywords are another example. When displayed and offered multilingual the best match language should be presented, but a Russian would not really understand and even cannot read latin scripting; same for Arabic people. A challenge anyway, while presenting the specific subset all keywords are considered on searching and filtering, perhaps even by translation of a limited controlled vocabulary.
  • I was not happy that the development of toolforge:hay/directory was terminated at an English only state.
    • IIRC in 2014 I made suggestions like these here for a widened data model, and for next release that should have been considered. However, this was never implemented. Until today the data model is one value English only as with the first design.
    • That directory is rather useless for people who are not familiar with English terms, nor are there sufficient filter capabilities to express a query for a tool but no bot for a particular Wiktionary challenge.
    • Since the data model of Toolhub seems to me quite the same like that one (and a full import of all previous entries appears possible) I am afraid that we end up once again with an English only solution for English speakers only.
    • TemplateData as of 2012 had multilingual support by first design. Not happy to discuss in 2018 an English only view.
  • BTW, I do appreciate the way how hay/directory uses distributed data sources.
    • I would expect that a maintainer will register an application just by leaving an URL or a WMF page name.
    • E.g. mw:User:.../myGadget/toolhub.json
    • toolhub.json is detectable somewhere in that page, if HTML, e.g. by an id="toolhub" element, or MIME is json for the entire page.
    • There maintainers can update description, version ID and all details without login procedure and authentification at the Toolhub administration.
    • At least once a day the ToolhubBot will visit all locations, collect and update possible changes, and gracefully ignore unavailable URL for some weeks.
    • Maintainers do know best the features which are currently available, and their translation takes precedence. Some other translations might be made by volunteers on translatewiki, but nobody knows how accurate such contemporary text might be.
    • User pages of type .json can be modified only by the user himself, which protects against vandalism.
  • Please note that I am maintaining two overview lists for applications (w:de:WP:LT and w:de:WP:HX) with thematic narrowing which I want to see sufficiently replaced by Toolhub one day, handing over the baton and retiring. The current data model is far off this target.

Greetings --PerfektesChaos (talk) 20:09, 19 June 2018 (UTC)

Hello PerfektesChaos:
  • Based on your recommendation, for `developer_docs_url`, `feedback_url`, and `privacy_policy_url`, you will have your choice of a bare URL or an array of objects containing a URL and language code, as such: {“language”: “de”, “url”: “https://example.com”}. I will post this to Meta when I finalize the schema on Saturday.
  • Doing this for `url` would be tricky. The toolinfo standard used by Hay’s Tool Directory has been used to document hundreds of tools since 2014, the `url` parameter has been there from the beginning, and I do not want to change its expected behavior overnight. However, as a compromise, I have added a new parameter `url_alternates` which should be like the other ones mentioned. This way, you have one `url` that serves as a default/fallback but could specify e.g. custom URLs for different user languages.
  • I did not extend this change to `api_url` or `translate_url` since, to the best of my knowledge, those should only be one link. Let me know if that is okay.
  • Screenshots and videos will definitely support multiple languages from the start. If there is a German screenshot, and the user has their language set to German, they will see a German screenshot.
  • In general, documenting gadgets and modeling them seems to be a difficult challenge, since each wiki has its own collection of gadgets, but we don’t want, for instance, separate entries in Toolhub for lintHint in German, in English, and Italian. Working out the various complexities with this is something I would like your help with in the future, if you are available to provide it.
  • The reason for centralizing string translations on translatewiki.net is that this is where the technical translators already do their work. Just like we want the interfaces of tools translated into as many languages as possible, we want to make it as easy as possible for the translator community to participate in translating the tools’ metadata. Unfortunately, I’m not aware of any way to have local overrides for translations without adding significant complexity and making the overall user experience worse. I agree in any case that URLs do not belong on translatewiki.net as strings to be translated.
  • The LT and HX pages you linked are very impressive. I wonder if we could think about how to migrate those documentation pages to MediaWiki.org or Wikitech so that the detailed information compiled there can be used more extensively by all the communities.
Please let me know if you have any additional questions or concerns. Harej (WMF) (talk) 00:41, 30 June 2018 (UTC)
First, thank you for the nice words on LT and HX.
Second, we agree that there are things which have one location only, and others which are language dependant.
  • One single entry is permitted for codebase, version number, Wikidata ID, probably programmer’s documentation and maintainer information (supposed to be available as one English version only) and material related to the implementation itself.
    • An API would be supposed to have the language and/or project parameter inside a query, I do not know any other case in the tool field. However, the MW API (both classic and REST) are local to the Wiki they examine.
  • One-or-many entries are possible for end-user guidance, like user documentation, filtering mechanism, user feedback channels, accompanying stuff like screenshots, annotations, videos, etc.
  • In simple cases, there is one documentation page only, just English (or Japanese?), with both users and programmers as audience. However, the data model to be developed needs to paint the full picture, which is easy to narrow on the single case if nothing else is available.
Greetings --PerfektesChaos (talk) 11:18, 30 June 2018 (UTC)

It's a bit unfortunate that the discussion is split on two talk pages on multiple topics. I've commented about these matters on Talk:Toolhub/Data_model#Comments_related_to_languages_and_translation. --Nikerabbit (talk) 13:09, 14 June 2018 (UTC)

Toolhub #Feedback suggested to put any feedback here on this page, with no exceptions. Greetings --PerfektesChaos (talk) 20:09, 19 June 2018 (UTC)

Great project!

I just wanted to say, that I'm really happy that your doing this project! The huge pile of unsorted, unfindable and messy situation of tools, gadgets, etc. is in my opinion a big an important problem. This will could a huge improvement. After this we only would need Gadgets 2.0. ;) -- MichaelSchoenitzer (talk) 14:35, 8 June 2018 (UTC)

What I do

I edit articles, upload files, informally patrol vandalism, I organize people, online or offline, train new editors, or new trainers, assist veteran editors, write simple code, define words on Wiktionary, create items on Wikidata, refine items on Wikidata, offload data from Wikipedia to Wikidata, migrate files from Wikipedia to the Wikimedia Commons, approve Pending Changes, and so on.

One of my efforts is to make every article have the same feel and look as every other article: continuity of style and flow.

What tools do you have for me to improve my workflow? Having fun! Cheers! Checkingfax (talk) 04:13, 14 June 2018 (UTC)

Discontinued tools

One very useful info would be if the tool still exists. There are many tools that do not work any more (whatever the reason, deleted, not updated, incompatible with new mw config, etc.), but are still in Hay's tools. For someone searching a tool for a specific work, it is very annoying to try 1, 2, 3 tools, all dead :(

--Hsarrazin (talk) 19:59, 21 June 2018 (UTC)

Hsarrazin, that is a good point. One idea of mine was to have a script regularly ping tools to check for uptime, but it sounds like that may not be enough for tools that are nominally "up" (return a 200 status code as opposed to a 404 or 500) but are effectively useless. Would the option to flag a tool as not working be helpful, do you think? Harej (WMF) (talk) 20:59, 21 June 2018 (UTC)
yes :) and perhaps notify the maintainer, or allow them to add a comment "I cannot maintain this tool anymore" or "this tool relied on WDQ, which has been disabled" (like https://tools.wmflabs.org/wikidata-nolabels/) or "This tool is now useless, due to such new feature", etc. --Hsarrazin (talk) 21:11, 21 June 2018 (UTC)