[FRIAM] Sorting Algorithm? AI? Identifying "types" within data

Russ Abbott russ.abbott at gmail.com
Tue Jan 10 12:16:31 EST 2023


Interesting problem.

Eric, as you said earlier, K-means requires a way to measure the distance
between objects -- so that those with smaller distances can be grouped
together. A problem is that there are a number of features, which may not
be correlated. For example, there is an income trajectory, a change of
company trajectory, a change of level-of-responsibility trajectory, a
change of subject-matter-focus trajectory, and probably more.  You might
build separate trajectories for each person and then see if you can group
the trajectories. For example, a "company man" may or may not have an
increasing responsibility trajectory. You would then have a
multi-dimensional space into which to put people.

-- Russ


On Mon, Jan 9, 2023 at 10:11 AM Nicholas Thompson <thompnickson2 at gmail.com>
wrote:

> To my uneducated eye, this seemed like one of Jon’s problems.
>
> Sent from my Dumb Phone
>
> On Jan 7, 2023, at 6:23 AM, Frank Wimberly <wimberly3 at gmail.com> wrote:
>
> 
> This answer seems reasonable to me.  I worked on Project Talent during
> 1967 which had some similar goals and data.  See
>
> https://en.m.wikipedia.org/wiki/Project_Talent
>
> Our data was for thousands of highschool students and our software was all
> written in Fortran.
>
> ---
> Frank C. Wimberly
> 140 Calle Ojo Feliz,
> Santa Fe, NM 87505
>
> 505 670-9918
> Santa Fe, NM
>
> On Fri, Jan 6, 2023, 11:32 PM Pieter Steenekamp <
> pieters at randcontrols.co.za> wrote:
>
>> I asked https://chat.openai.com/chat and here is the conversation:
>>
>> *Pieter Steenekamp*
>> can you suggest a solution for the following problem "I'm hoping someone
>> here could help out. Let's imagine I had some data where each row was a
>> person's career. We could list major events every year.For example: 2004
>> they were highered, 2007 they get a promotion, 2010 they leave for a
>> different company, 2012 they come back at a higher level, 2015 get a
>> promotion, then no change until 2022.  Let's say I had data like this for
>> roughly 2 million people, and that there are around 10 different types of
>> changes that could happen during any time period (could be yearly,
>> quarterly, monthly, I can make it how I want).  I was hoping we could ask a
>> computer to tell us if there were "types of careers" that people had. We
>> could say "put all these careers into 4 buckets" or "7 buckets" based on
>> similarity. Then we could look at the piles the computer made and try to
>> make sense of them. One type might be "company man" for people who tend to
>> stay in place for 20 or more years, another type could be a "rotator", who
>> leaves and returns every 3 years or so. Etc. The point is, I want a
>> computer to make the piles for me, rather than trying to come up with
>> potential piles a priori. Are there methods for doing this? I know it's a
>> problem we've *talked* about a lot, but I don't know if there are
>> solutions. "
>> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
>> There are several approaches that you could use to group the careers of
>> these 2 million people into clusters based on similarity. One approach is
>> to use unsupervised machine learning techniques, specifically clustering
>> algorithms, to automatically group the data into clusters. Some popular
>> clustering algorithms that you could use for this problem include k-means,
>> hierarchical clustering, and DBSCAN (Density-Based Spatial Clustering of
>> Applications with Noise). You could also use dimensionality reduction
>> techniques, such as PCA (Principal Component Analysis) or t-SNE
>> (t-Distributed Stochastic Neighbor Embedding), to reduce the dimensionality
>> of the data and then use a clustering algorithm to group the data into
>> clusters. Another approach is to use supervised machine learning
>> techniques, where you would need to manually label a subset of the data
>> with the cluster labels, and then use this labeled data to train a
>> classifier to predict the cluster labels for the remaining data. It's worth
>> noting that these approaches will require some feature engineering, where
>> you will need to extract relevant features from the data that can be used
>> to represent the careers of the people in the dataset. Once you have
>> identified and extracted these features, you can then use them as input to
>> the machine learning algorithms to group the data into clusters.
>> *Pieter Steenekamp*
>> would it then be possible to get  for example if there were "types of
>> careers" that people had?
>> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
>> Yes, once you have grouped the careers of the 2 million people into
>> clusters using one of the approaches I mentioned above, you can then
>> analyze the characteristics of each cluster to identify common patterns and
>> trends within each cluster. This will allow you to identify "types of
>> careers" that people have, based on the similarity of their career events.
>> For example, if one of the clusters is characterized by people who have had
>> long tenures at a single company, you could label this cluster as "company
>> man" careers. Similarly, if another cluster is characterized by people who
>> frequently switch between companies, you could label this cluster as
>> "rotator" careers. You can then use this information to gain insights into
>> the types of careers that are represented in the dataset, and potentially
>> draw conclusions about the factors that may influence career paths and
>> success.
>> *Pieter Steenekamp*
>> can you propose python code to do sometingh like this please
>> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
>> Sure, here is an example of how you could use the scikit-learn library in
>> Python to group the careers of the 2 million people into clusters using the
>> k-means algorithm:
>> Copy code
>> from sklearn.cluster import KMeans
>> # Load the data into a NumPy array
>> data = np.loadtxt('career_data.txt')
>> # Create the k-means model
>> model = KMeans(n_clusters=4)
>> # Fit the model to the data
>> model.fit(data)
>> # Predict the cluster labels for each data point
>> labels = model.predict(data)
>> # Print the cluster labels
>> print(labels)
>> This code will group the data into 4 clusters, as specified by the
>> n_clusters parameter. You can adjust this parameter to change the number of
>> clusters that the data is grouped into.
>> To extract features from the data that can be used as input to the
>> k-means model, you will need to preprocess the data and select relevant
>> features. This will likely involve some feature engineering, which will
>> depend on the specific characteristics of the data and the goals of your
>> analysis.
>> I hope this helps! Let me know if you have any questions.
>>
>> On Fri, 6 Jan 2023 at 19:34, Eric Charles <eric.phillip.charles at gmail.com>
>> wrote:
>>
>>> Greetings all,
>>> I'm hoping someone here could help out. Let's imagine I had some data
>>> where each row was a person's career. We could list major events every
>>> year.
>>>
>>> For example: 2004 they were highered, 2007 they get a promotion, 2010
>>> they leave for a different company, 2012 they come back at a higher level,
>>> 2015 get a promotion, then no change until 2022.
>>>
>>> Let's say I had data like this for roughly 2 million people, and that
>>> there are around 10 different types of changes that could happen during any
>>> time period (could be yearly, quarterly, monthly, I can make it how I
>>> want).
>>>
>>> I was hoping we could ask a computer to tell us if there were "types of
>>> careers" that people had. We could say "put all these careers into 4
>>> buckets" or "7 buckets" based on similarity. Then we could look at the
>>> piles the computer made and try to make sense of them.
>>>
>>> One type might be "company man" for people who tend to stay in place for
>>> 20 or more years, another type could be a "rotator", who leaves and returns
>>> every 3 years or so. Etc. The point is, I want a computer to make the piles
>>> for me, rather than trying to come up with potential piles a priori.
>>>
>>> Are there methods for doing this? I know it's a problem we've *talked*
>>> about a lot, but I don't know if there are solutions.
>>>
>>> Any help would be appreciated.
>>>
>>> Best,
>>> Eric
>>>
>>> <echarles at american.edu>
>>> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
>>> FRIAM Applied Complexity Group listserv
>>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>>> https://bit.ly/virtualfriam
>>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>>> FRIAM-COMIC http://friam-comic.blogspot.com/
>>> archives:  5/2017 thru present
>>> https://redfish.com/pipermail/friam_redfish.com/
>>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>>
>> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
>> FRIAM Applied Complexity Group listserv
>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>> https://bit.ly/virtualfriam
>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>> FRIAM-COMIC http://friam-comic.blogspot.com/
>> archives:  5/2017 thru present
>> https://redfish.com/pipermail/friam_redfish.com/
>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230110/59adfe9a/attachment.html>


More information about the Friam mailing list